geodezyx.stats package

Submodules

geodezyx.stats.least_squares module

@author: psakic

This sub-module of geodezyx.stats contains functions for least-squares processing.

it can be imported directly with: from geodezyx import stats

The geodezyx toolbox is a software for simple but useful functions for Geodesy and Geophysics under the GNU LGPL v3 License

Copyright (C) 2019 Pierre Sakic et al. (IPGP, sakic@ipgp.fr) GitHub repository : https://github.com/IPGP/geodezyx

geodezyx.stats.least_squares.bins_middle(bin_edges)

Calculate the middle points of histogram bins.

Parameters:

bin_edges (array-like) – Bin edge values from numpy.histogram.

Returns:

List of bin center values (midpoints between consecutive edges).

Return type:

list

geodezyx.stats.least_squares.chi2_test_frontend(dist_inp, nbins=10, ddof=2, debug=0, mode2=False, aaa=1)

Perform chi-square goodness-of-fit test on a distribution.

Parameters:
  • dist_inp (array-like) – Input distribution data.

  • nbins (int, optional) – Number of bins for histogram. Default is 10.

  • ddof (int, optional) – Degrees of freedom correction. Default is 2.

  • debug (int, optional) – If non-zero, plot debug visualization. Default is 0.

  • mode2 (bool, optional) – Normalization mode. If False (default), normalize theoretical distribution. If True, normalize observed distribution (less common, may be incoherent). Default is False.

  • aaa (float, optional) – Scaling factor for standard deviation. Default is 1.

Returns:

(chi2_statistic, p_value) from scipy.stats.chisquare, or (nan, nan) if error.

Return type:

tuple

Warning

The method for generating theoretical values is somewhat heuristic. See MATLAB’s chi2gof.m line 185 for reference.

Notes

In debug mode, returns: bin_edges, bin_edges2, hist, gauss, chi2 Otherwise returns chi2 statistic and p-value only.

geodezyx.stats.least_squares.chi2_test_lsq(V, A, P=None, fuvin=None, risk=0.05, cleaning_std=False, cleaning_normalized=False, koefP=1)

Perform chi-square test on least squares residuals.

Parameters:
  • V (array-like) – Residuals vector.

  • A (ndarray) – Jacobian matrix.

  • P (array-like, optional) – Weight matrix (diagonal only). Default is None.

  • fuvin (float, optional) – Unitary variance factor. If None, will be calculated. Default is None.

  • risk (float, optional) – Significance level for chi-square test. Default is 0.05.

  • cleaning_std (bool, optional) – If True, clean data by removing standard deviation outliers. Default is False.

  • cleaning_normalized (bool, optional) – If True, clean data using normalized residuals (preferred over cleaning_std). Default is False.

  • koefP (float, optional) – Coefficient applied to P for finding viable solution. Default is 1.

Returns:

Test statistics if successful, None if both fuvin and P are None.

Return type:

tuple or None

Notes

P should be the diagonal of the weight matrix, not the full matrix.

Cleaning options are tricks to approach FUV=1 by removing worst values. cleaning_normalized is preferred and overrides cleaning_std.

If koefP != 1, the adjusted P is returned as second argument.

geodezyx.stats.least_squares.clean_nan(A, L)

Deprecated since version This: function is discontinued. Use nan_cleaner() instead.

Remove NaN values from a 2D array and corresponding 1D array.

Parameters:
  • A (ndarray) – 2D array to clean.

  • L (ndarray) – 1D array to clean.

Returns:

  • A_clean (ndarray) – 2D array with NaN values removed.

  • L_clean (ndarray) – 1D array with corresponding NaN values removed.

Notes

Mutually removes NaN values from both A and L at the same indices.

geodezyx.stats.least_squares.constraint_improve_N(N, C, trans=False, outsparsetype='csc')

give N normal matrix and C constraints matrix returns N compined with C trans is a (dirty) way to transpose C if made in wrong shape

convention Ghilani 2011 p424 :

N C.T C 0

geodezyx.stats.least_squares.ellipse_angle_of_rotation(a, outdeg=True)

Calculate the rotation angle of an ellipse from its equation parameters.

Parameters:
  • a (array-like) – Ellipse equation coefficients [a, b, c, d, f, g].

  • outdeg (bool, optional) – If True, return angle in degrees. If False, return in radians. Default is True.

Returns:

Rotation angle of the ellipse.

Return type:

float

References

http://nicky.vanforeest.com/misc/fitEllipse/fitEllipse.html

geodezyx.stats.least_squares.ellipse_axis_length(a)

Calculate the semi-major and semi-minor axis lengths of an ellipse.

Parameters:

a (array-like) – Ellipse equation coefficients [a, b, c, d, f, g].

Returns:

Array of [semi_major_axis, semi_minor_axis].

Return type:

ndarray

References

http://nicky.vanforeest.com/misc/fitEllipse/fitEllipse.html

geodezyx.stats.least_squares.ellipse_center(a)

Calculate the center coordinates of an ellipse from its equation parameters.

Parameters:

a (array-like) – Ellipse equation coefficients [a, b, c, d, f, g] from the general ellipse equation: a*x^2 + 2*b*xy + c*y^2 + 2*d*x + 2*f*y + g = 0

Returns:

Center coordinates [x0, y0].

Return type:

ndarray

References

http://nicky.vanforeest.com/misc/fitEllipse/fitEllipse.html

geodezyx.stats.least_squares.ellipse_fit(x, y)

Fit an ellipse to a set of 2D points.

Parameters:
  • x (array-like) – X coordinates of the points to fit.

  • y (array-like) – Y coordinates of the points to fit.

Returns:

(a, b, phi, x0, y0) where:

  • afloat

    Semi-major axis length

  • bfloat

    Semi-minor axis length

  • phifloat

    Rotation angle in degrees

  • x0float

    X coordinate of the center

  • y0float

    Y coordinate of the center

Return type:

tuple

References

http://nicky.vanforeest.com/misc/fitEllipse/fitEllipse.html

geodezyx.stats.least_squares.ellipse_get_coords(a=0.0, b=0.0, x=0.0, y=0.0, angle=0.0, k=2, out_separate_X_Y=True, trigo=True)

Draws an ellipse using (360*k + 1) discrete points.

Parameters:
  • a (float) – major axis distance

  • b (float) – minor axis distance

  • x (float) – offset along the x-axis

  • y (float) – offset along the y-axis

  • angle (float) –

    trigo/clockwise rotation [in degrees] of the ellipse; - angle=0 : the ellipse is aligned with the positive x-axis - angle=30 : rotated 30 degrees trigo/clockwise from positive x-axis

    trigo sense is the standard convention

  • k (int) – k = 1 means 361 points (degree by degree)

  • out_separate_X_Y (bool) – if True, returns separate X and Y arrays

  • trigo (bool) – if True, use trigonometric convention; if False, use clockwise convention

Notes

This function is based on pseudo code given at http://en.wikipedia.org/wiki/Ellipse

The internal convention is clockwise, but we prefer trigo convention for the Ghiliani ellipses made by error_ellipse_parameters

References

scipy-central.org/item/23/1/plot-an-ellipse

geodezyx.stats.least_squares.error_ellipse(xm, ym, sigx, sigy, sigxy, nsig=1, ne=100, scale=1)

from matlab fct http://kom.aau.dk/~borre/matlab/geodesy/errell.m It works but don’t ask why …

(X,Y) orientation convention is inverted => (Y,X) … so in a practical way you must invert X ,Y (it is not important for the axis but it is for the orientation) AND sigx,sigy,sigxy must be first normalized with the fuv

sigx, sigy, sigxy :

so as we can generate a covariance matrix cov = np.array([[sigx ** 2,sigxy],[sigxy,sigy ** 2]])

ne :

nb of segements for the ellipse

RETURNS :

xe,ye,dx2,dy2

DEBUG:

si on a xe1,ye1,_,_ = stats.error_ellipse(pxp[0],pxp[1], sigxB , sigyB , sigxyB, scale= 10000) xe2,ye2,_,_ = stats.error_ellipse(pxp[0],pxp[1], sigyB , sigxB , sigxyB, scale= 10000) et PAS les - à D et dxy0 => on a 2 ellipses differentes

si on a xe1,ye1,_,_ = stats.error_ellipse(pxp[0],pxp[1], sigxB , sigyB , sigxyB, scale= 10000) xe2,ye2,_,_ = stats.error_ellipse(pxp[0],pxp[1], sigyB , sigxB , sigxyB, scale= 10000) et AVEC les - à D et dxy0 => on a 2 ellipses differentes au moins une ellipse coincide avec celle de Ghiliani

A investiguer, en attendant, à éviter

geodezyx.stats.least_squares.error_ellipse_parameters(qxx, qyy, qxy, fuv, out_t=False)

Calculate error ellipse parameters from variance-covariance matrix elements.

Parameters:
  • qxx (float) – Variance in x direction (from variance-covariance matrix, unnormalized).

  • qyy (float) – Variance in y direction (from variance-covariance matrix, unnormalized).

  • qxy (float) – Covariance between x and y (from variance-covariance matrix, unnormalized).

  • fuv (float) – Unitary variance factor for normalization.

  • out_t (bool, optional) – If False (default), return angle in trigonometric convention (counterclockwise from x-axis). If True, return angle in clockwise direction from y-axis. Default is False.

Returns:

  • Su (float) – Semi-major axis (scaled by FUV).

  • Sb (float) – Semi-minor axis (scaled by FUV).

  • t or phi (float) –

    • If out_t=True: t (angle in clockwise direction from y-axis in degrees)

    • If out_t=False: phi (angle in trigonometric direction from x-axis in degrees)

Notes

At least one ellipse should coincide with that of Ghilani for verification.

This function calculates the principal axes and orientation of the 2D confidence ellipse from the variance-covariance matrix.

References

Strang & Borre methods for comparison.

geodezyx.stats.least_squares.error_ellipse_parameters_2(sigx, sigy, sigxy, out_deg=True)

Calculate error ellipse parameters using Strang & Borre convention.

Parameters:
  • sigx (float) – Standard deviation in x direction (must be normalized with FUV).

  • sigy (float) – Standard deviation in y direction (must be normalized with FUV).

  • sigxy (float) – Covariance between x and y (must be normalized with FUV).

  • out_deg (bool, optional) – If True, return angles in degrees. If False, return in radians. Default is True.

Returns:

(semi_major_axis, semi_minor_axis, angle)

Return type:

tuple

Warning

The (X,Y) orientation convention is inverted from standard use, so in practice you must invert X and Y inputs. This is not important for the axes magnitudes but IS important for the orientation angle.

Notes

sigx, sigy, and sigxy must be pre-normalized with the FUV.

References

Strang & Borre p. 337

geodezyx.stats.least_squares.fitEllipse_core(x, y)

Core function to fit an ellipse to 2D points using least squares.

Parameters:
  • x (array-like) – X coordinates of the points.

  • y (array-like) – Y coordinates of the points.

Returns:

Ellipse equation coefficients [a, b, c, d, f, g].

Return type:

ndarray

References

http://nicky.vanforeest.com/misc/fitEllipse/fitEllipse.html

geodezyx.stats.least_squares.fuv_calc(V, A, P=1, normafuv=1)

Calculate the unitary variance factor (Facteur Unitaire de Variance).

Parameters:
  • V (array-like) – Residuals vector.

  • A (ndarray) – Jacobian matrix.

  • P (int, float, or ndarray, optional) – Weight matrix or weight value. Can be a scalar, array, or sparse matrix. Default is 1.

  • normafuv (int or float, optional) – Normalization factor for the FUV. Default is 1.

Returns:

The unitary variance factor (FUV).

Return type:

float

Notes

The FUV depends on the weight matrix P:

  • With weight of 10**-6: FUV = 439828.260843

  • With weight of 1: FUV = 4.39828260843e-07

But sigmas remain constant regardless of P weights.

Both standard arrays and sparse arrays are supported for P.

geodezyx.stats.least_squares.fuv_calc_OLD(V, A)

Deprecated since version Use: fuv_calc() instead.

Legacy implementation of unitary variance factor calculation.

Parameters:
  • V (array-like) – Residuals vector.

  • A (ndarray) – Jacobian matrix.

Returns:

The unitary variance factor.

Return type:

float

geodezyx.stats.least_squares.fuv_calc_OLD2(V, A, P=None)

Deprecated since version Use: fuv_calc() instead.

Legacy implementation of unitary variance factor with weight matrix.

Parameters:
  • V (array-like) – Residuals vector.

  • A (ndarray) – Jacobian matrix.

  • P (ndarray, optional) – Weight matrix. Default is None (identity matrix).

Returns:

The unitary variance factor.

Return type:

float

geodezyx.stats.least_squares.get_accur_coeff(i)

Get accuracy coefficients for finite difference calculations.

Parameters:

i (int) – Accuracy order index. Values > 3 return the highest accuracy coefficients.

Returns:

Array of finite difference coefficients.

Return type:

ndarray

References

https://en.wikipedia.org/wiki/Finite_difference_coefficient

geodezyx.stats.least_squares.jacobian(f, var_in_list, var_out, kwargs_f_list=[], h=1e-06, nproc=4)

Compute Jacobian matrix in parallel using multiple processes.

Parameters:
  • f (callable) – The function for which to compute the Jacobian.

  • var_in_list (iterable) – List of variables with respect to which derivation is performed.

  • var_out (int) – Output index to consider.

  • kwargs_f_list (list of dict, optional) – List of keyword argument dictionaries. Default is [].

  • h (float, optional) – Derivation step. Default is 10**-6.

  • nproc (int, optional) – Number of processes for parallel computation. Default is 4.

Returns:

Jacobian matrix with shape (n_observations, n_variables).

Return type:

ndarray

Notes

Only keyword arguments (kwargs) are managed. Positional arguments are not currently supported.

geodezyx.stats.least_squares.jacobian_line(f, var_in_list, var_out=0, kwargs_f={}, args_f=[], h=0, aray=True)

Compute a line of the Jacobian matrix (derivatives with respect to multiple variables).

Parameters:
  • f (callable) – The Python function to differentiate.

  • var_in_list (iterable) – List/tuple of variables with respect to which derivation is performed.

  • var_out (int, optional) – The output index of f to consider. Default is 0.

  • kwargs_f (dict, optional) – Dictionary of keyword arguments for f. Default is {}.

  • args_f (iterable, optional) – List/tuple of positional arguments for f. Default is [].

  • h (float, optional) – Derivation step. Default is 0 (auto-calculated).

  • aray (bool, optional) – If True, return result as numpy array. If False, return as list. Default is True.

Returns:

Array or list of partial derivatives with respect to each variable in var_in_list.

Return type:

ndarray or list

See also

partial_derive

Compute single partial derivative

geodezyx.stats.least_squares.kwargs_for_jacobian(kwdic_generik, kwdic_variables)

Build a list of kwargs dictionaries for Jacobian computations.

Parameters:
  • kwdic_generik (dict) – Dictionary of parameters that will not change across computations.

  • kwdic_variables (dict) – Dictionary of parameters that will change, with iterables as values. Each key should map to an iterable of values to be tested.

Returns:

List of keyword argument dictionaries with all combinations of variables from kwdic_variables merged with kwdic_generik.

Return type:

list of dict

Notes

Creates a Cartesian product of all iterables in kwdic_variables.

geodezyx.stats.least_squares.nan_cleaner(Ain, Bin)

Remove NaN values from two arrays/lists simultaneously.

Parameters:
  • Ain (array-like) – First input array or list.

  • Bin (array-like) – Second input array or list.

Returns:

  • Aout (ndarray) – First array with NaN values removed.

  • Bout (ndarray) – Second array with NaN values removed at the same indices as Ain.

Notes

Removes all rows where either Ain or Bin contains a NaN value.

geodezyx.stats.least_squares.partial_derive(f, var_in, var_out=0, kwargs_f={}, args_f=[], h=0, accur=-1)

Compute partial derivatives of a Python function numerically.

Parameters:
  • f (callable) – The Python function to differentiate. Must return a scalar or iterable. Parameters susceptible to be differentiated must be scalars. For instance, to differentiate a position vector X = [x,y,z], f must take arguments f(x,y,z) not f(X).

  • var_in (int or str) – The variable with respect to which the derivation is performed. Can be an int (starting with 0) or a string describing the name of the variable in f’s arguments.

  • var_out (int, optional) – The output index of f to consider. Default is 0.

  • kwargs_f (dict, optional) – Dictionary describing keyword arguments of f. Default is {}.

  • args_f (iterable, optional) – Tuple/list describing positional arguments of f. Default is [].

  • h (float, optional) – Derivation step. If h == 0, uses x * sqrt(epsilon). Default is 0. See http://en.wikipedia.org/wiki/Numerical_differentiation

  • accur (int, optional) – Accuracy coefficient index. -1 provides best accuracy (slowest). Default is -1. See https://en.wikipedia.org/wiki/Finite_difference_coefficient

Returns:

The derivative of f with respect to var_in.

Return type:

float

Notes

The function automatically adjusts h if it equals 0 using: h = x * sqrt(machine_epsilon)

References

http://en.wikipedia.org/wiki/Numerical_differentiation https://en.wikipedia.org/wiki/Finite_difference_coefficient

geodezyx.stats.least_squares.partial_derive_old(f, var_in, var_out=0, kwargs_f={}, args_f=[], h=0)

Deprecated since version Use: partial_derive() instead.

Legacy implementation for computing partial derivatives of a Python function.

Parameters:
  • f (callable) – The Python function to differentiate.

  • var_in (int or str) – Variable with respect to which derivation is performed. Can be an int (starting with 0) or a string name.

  • var_out (int, optional) – The output index of f to consider. Default is 0.

  • kwargs_f (dict, optional) – Dictionary of keyword arguments for f. Default is {}.

  • args_f (iterable, optional) – Tuple/list of positional arguments for f. Default is [].

  • h (float, optional) – Derivation step. If h == 0, uses x * sqrt(epsilon). Default is 0.

Returns:

The derivative of f with respect to var_in.

Return type:

float

References

http://en.wikipedia.org/wiki/Numerical_differentiation

geodezyx.stats.least_squares.sigmas_formal_calc(N, V, A, fuv=None, P=None)

Calculate formal standard deviations (sigmas) from least squares solution.

Parameters:
  • N (ndarray) – Normal equation matrix (A^T * P * A).

  • V (array-like) – Residuals vector.

  • A (ndarray) – Jacobian matrix.

  • fuv (float, optional) – Unitary variance factor. If None, it will be calculated. Default is None.

  • P (ndarray, optional) – Weight matrix or array. Default is None.

Returns:

Array of formal standard deviations (sigmas) for each parameter.

Return type:

ndarray

See also

fuv_calc

Calculate the unitary variance factor

geodezyx.stats.least_squares.smart_i_giver(subgrp_len_list, i_in_sublis, sublis_id, advanced=False, sublis_id_list=[])

Convert a local index within a subgroup to a global index.

Parameters:
  • subgrp_len_list (list) – List of lengths for each subgroup.

  • i_in_sublis (int) – Local index within the subgroup.

  • sublis_id (int or hashable) – The identifier of the subgroup to access.

  • advanced (bool, optional) – If True, sublis_id is a generic identifier (str, int, set, etc.) present in sublis_id_list. If False, sublis_id is an integer index. Default is False.

  • sublis_id_list (list, optional) – List of subgroup identifiers (used when advanced=True). Default is [].

Returns:

Global index.

Return type:

int

Examples

>>> subgrp_len_list = [4201, 4186, 4157, 4041, 4058, 4204, 4204, 4204, 4204]
>>> i_in_sublis = 2
>>> sublis_id = 3
>>> smart_i_giver(subgrp_len_list, i_in_sublis, sublis_id)
12544  # sum of [4201 + 4186 + 4157 + 2]
must be an int
geodezyx.stats.least_squares.triangle_arr2vect(triarrin, k=1)

Convert upper triangular matrix to 1D vector.

Parameters:
  • triarrin (ndarray) – Input 2D triangular array.

  • k (int, optional) – Diagonal offset. k=0 includes the main diagonal, k=1 excludes it. Default is 1.

Returns:

1D vector of upper triangular elements.

Return type:

ndarray

See also

numpy.triu_indices_from

Get indices of upper triangle

geodezyx.stats.least_squares.weight_mat(Sinp, Ninp=[], fuvinp=1, sparsediag=False)
Args :

Sinp : liste des Sigmas sig = sqrt(var) Ninp : liste de la taille de chaque blocs (obs) fuvinp = 1 : facteur unitaire de variance inspiré de mat_poids , fct écrite dans la lib resolution de GPShApy

Returns :

K : matrice de var-covar Q : matrice des cofacteurs p : matrice des poids inv(Q)

geodezyx.stats.least_squares.weight_mat_simple(Pinp, Ninp=[], sparsediag=False, return_digaonal_only=False)

Simple version of weight_mat : takes directly the weights (Pinp) and the size for each weigths blocks (Ninp)

Pinp and Ninp have to have the same length

Args :

Pinp : list of weigths Ninp : list of the size of each block (obs number) fuvinp = 1 : facteur unitaire de variance

Returns :

p : weigth matrix

geodezyx.stats.stats module

@author: psakic

This sub-module of geodezyx.stats contains functions for low-level statistics.

it can be imported directly with: from geodezyx import stats

The geodezyx toolbox is a software for simple but useful functions for Geodesy and Geophysics under the GNU LGPL v3 License

Copyright (C) 2019 Pierre Sakic et al. (IPGP, sakic@ipgp.fr) GitHub repository : https://github.com/IPGP/geodezyx

geodezyx.stats.stats.RMSmean(indata)

Calculate the root mean square (RMS) mean of a list or array.

Parameters:

indata (array-like) – Input values.

Returns:

The RMS mean value.

Return type:

float

Warning

This function is redundant with rms_mean. Use rms_mean instead.

Notes

The RMS mean is defined as: sqrt(mean(indata^2)) NaN values are ignored in the calculation.

geodezyx.stats.stats.butter_lowpass(cutoff, fs, order=5)

Design a Butterworth low-pass filter.

Parameters:
  • cutoff (float) – Cutoff frequency.

  • fs (float) – Sampling frequency.

  • order (int, optional) – Order of the filter. The default is 5.

Returns:

  • b (numpy.ndarray) – Numerator coefficients of the IIR filter.

  • a (numpy.ndarray) – Denominator coefficients of the IIR filter.

Notes

This function uses scipy.signal.butter to design the filter.

geodezyx.stats.stats.butter_lowpass_filter(data, cutoff, fs, order=5)

Apply a Butterworth low-pass filter to data.

Parameters:
  • data (array-like) – Input data to be filtered.

  • cutoff (float) – Cutoff frequency.

  • fs (float) – Sampling frequency.

  • order (int, optional) – Order of the filter. The default is 5.

Returns:

y – Filtered data.

Return type:

numpy.ndarray

See also

butter_lowpass

Design the Butterworth low-pass filter.

geodezyx.stats.stats.color_of_season(datein)

Get a color representation of the season.

Parameters:

datein (datetime or date) – The date to get the season color for.

Returns:

Color code: ‘b’ (blue) for winter, ‘r’ (red) for summer, ‘g’ (green) for spring, ‘k’ (black) for autumn.

Return type:

str

Notes

Useful for plotting time series with color-coded seasons.

See also

get_season

Get the season name for a given date.

geodezyx.stats.stats.confid_interval_slope(x, y, alpha=0.95)

Calcule un intervalle de confiance sur une tendance

Parameters:
  • x (array) – la variable indépendante

  • y (array) – la variable dépendante

  • alpha (float) – la probabilité d’erreur tolérée

Returns:

  • mi (float) – la borne inférieure de l’intervalle

  • ma (float) – la borne supérieure de l’intervalle

References

http://www.i4.auc.dk/borre/matlab http://kom.aau.dk/~borre/matlab/

geodezyx.stats.stats.dates_middle(start, end)

Calculate the middle date between two dates.

Parameters:
  • start (datetime) – Start date.

  • end (datetime) – End date.

Returns:

The middle date between start and end.

Return type:

datetime

geodezyx.stats.stats.detrend_timeseries(X, Y)

detrend, i.e. remove linear tendence of a timeserie Y(X)

Parameters:

Y (X &) – Values

Returns:

X & Yout – Detrended Y

Return type:

list or numpy.array

geodezyx.stats.stats.find_intersection(x1, y1, x2, y2)

Find intersection points of two curves defined by points (x1, y1) and (x2, y2).

Parameters:
  • x1 (array-like) – X coordinates of the first curve.

  • y1 (array-like) – Y coordinates of the first curve.

  • x2 (array-like) – X coordinates of the second curve.

  • y2 (array-like) – Y coordinates of the second curve.

Returns:

  • roots (numpy.ndarray) – X coordinates of intersection points.

  • y_vals (numpy.ndarray) – Y coordinates of intersection points.

References

http://stackoverflow.com/questions/8094374/python-matplotlib-find-intersection-of-lineplots

geodezyx.stats.stats.gaussian_filter_GFZ_style_smoother(tim_ref, dat_ref, width=7)

Gaussian filter to smooth data based on GFZ’s GMT_plus.pm/gaussian_kernel.

This is a deprecated slow version. Use gaussian_filter_GFZ_style_smoother_improved instead.

Parameters:
  • tim_ref (array-like) – The X/T component of the time series (in decimal days).

  • dat_ref (array-like) – The Y component (the data).

  • width (int, optional) – Size of the window. An odd number is best. The default is 7.

Returns:

dat_smt – Smoothed Y values.

Return type:

list

Warning

This version is very slow (dirty conversion of a Perl function). Use gaussian_filter_GFZ_style_smoother_improved instead.

Notes

Some other nice ideas for smoothing: http://scipy-cookbook.readthedocs.io/items/SignalSmooth.html https://stackoverflow.com/questions/20618804/how-to-smooth-a-curve-in-the-right-way https://stackoverflow.com/questions/32900854/how-to-smooth-a-line-using-gaussian-kde-kernel-in-python-setting-a-bandwidth

See also

gaussian_filter_GFZ_style_smoother_improved

Pythonic version that should be used.

geodezyx.stats.stats.gaussian_filter_GFZ_style_smoother_improved(tim_ref, dat_ref, width=7)

Gaussian filter to smooth data, based on GFZ’s GMT_plus.pm/gaussian_kernel

Parameters:
  • tim_ref (iterable (list or array)) – the X/T component of the time serie (in decimal days!)

  • dat_ref (iterable (list or array)) – the Y component (the data).

  • width (int, optional) – size of the window (odd number is best ?). The default is 7.

Returns:

dat_smt2 – smoothed Y.

Return type:

array

geodezyx.stats.stats.get_season(now)

Determine the season for a given date.

Parameters:

now (datetime or date) – The date to determine the season for.

Returns:

The season name: ‘winter’, ‘spring’, ‘summer’, or ‘autumn’.

Return type:

str

Notes

Season boundaries are based on the Northern Hemisphere calendar: - Winter: Dec 21 - Mar 20 - Spring: Mar 21 - Jun 20 - Summer: Jun 21 - Sep 22 - Autumn: Sep 23 - Dec 20

geodezyx.stats.stats.harmonic_mean(A)

Calculate the harmonic mean of a list or array.

Parameters:

A (array-like) – Input values.

Returns:

The harmonic mean of A.

Return type:

float

Notes

The harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals.

geodezyx.stats.stats.lagrange1(points)

Low level function to determine a lagrangian polynom

Replace scipy.interpolate.lagrange which is HIGHLY instable

Parameters:

points (list of n-interable) – point list.

Returns:

  • p (function) – function representing the polynom.

  • Source

  • ——

  • from (https://gist.github.com/melpomene/2482930)

geodezyx.stats.stats.lagrange2(X, Y)

Low level function to determine a lagrangian polynom

Replace scipy.interpolate.lagrange which is HIGHLY instable

this function is more pythonic, but slower thant lagrange1….

Parameters:

points (list of n-interable) – point list.

Returns:

  • p (function) – function representing the polynom.

  • Source

  • ——

  • from (https://gist.github.com/melpomene/2482930)

geodezyx.stats.stats.lagrange_interpolate(tdata, ydata, titrp, n=10, t_type='datetime')

Perform a temporal lagrangian interpolation the X-component is a time

Parameters:
  • tdata (iterable of datetime) – X/T component of the known points.

  • ydata (iterable of floats) – Y component of the known points..

  • titrp (iterable of datetime) – Epochs of the wished points.

  • n (int, optional) – degree of the polynom. Better if even. The default is 10.

  • t_type (str, optional) – type of the time component, can be “datetime”, “posix” or “pandas_timestamp”. The default is “datetime”. pandas_timestamp is recommended for a more precise applications (nanosecond precision instead of microsecond for datetime)

Returns:

  • Yintrp (float array) – output interpolated data.

  • Tips

  • —-

  • Use conv.dt_range to generate the wished epochs range

geodezyx.stats.stats.linear_coef_a_b(x1, y1, x2, y2)

Gives coefficients of the line between two points (x1,y1) & (x2,y2) x1,y1,x2,y2 can be iterables

Parameters:
  • x1 (float or list or numpy.array) – Coordinates of the 1st and the 2nd point

  • y1 (float or list or numpy.array) – Coordinates of the 1st and the 2nd point

  • x2 (float or list or numpy.array) – Coordinates of the 1st and the 2nd point

  • y2 (float or list or numpy.array) – Coordinates of the 1st and the 2nd point

Returns:

  • a (float) – regression coefficient

  • b1 & b2 (float) – regression offsets coefficient (b1 must be equal to b2)

geodezyx.stats.stats.linear_reg_getvalue(X, a, b, full=True)

From a vector X and coefficients a & b, get Y = a*X + b

Parameters:
  • X (list or numpy.array) – Values

  • b (a &) – Linear regression coefficients

  • full (bool) – True : return X , Y = aX + b , False : return Y = aX + b

Returns:

  • Y (numpy.array) – if full == False

  • OR

  • X , Y (numpy.array) – if full == True

Note

Unstable while working with POSIX Time as X-data (too heigh values ? …) Decimal Years are recommended

geodezyx.stats.stats.linear_regression(x, y, fulloutput=False, simple_lsq=False, alpha=0.95)

Performs linear regression on two vectors, X and Y, and returns the coefficients a (slope) and b (intercept).

Parameters:
  • x (list or numpy.array) – The X values.

  • y (list or numpy.array) – The Y values.

  • simple_lsq (bool, optional) – If True, performs a basic, low-level least square inversion (faster, but less outputs). If False, calls scipy’s linregress. Default is False.

  • fulloutput (bool, optional) – If True, returns additional outputs (confidence interval for the slope and standard deviation). Default is False.

  • alpha (float, optional) – The alpha value for the confidence interval. Default is .95.

Returns:

  • slope (float) – The slope (a) of the linear regression.

  • intercept (float) – The intercept (b) of the linear regression.

  • confid_interval_slope (tuple of float, optional) – The confidence interval for the slope. Only returned if fulloutput is True.

  • std_err (float, optional) – The standard deviation. Only returned if fulloutput is True.

Notes

This function performs a similar job to scipy.stats.linregress.

Regarding computation speed: low-level least square inversion is faster for small datasets. For larger datasets, scipy’s linregress is faster (n points > ~13000).

geodezyx.stats.stats.mad(data, mode='median')

Calculate the Median Absolute Deviation (MAD) of a list or array.

Parameters:
  • data (array-like) – Input values.

  • mode (str, optional) – Calculation mode. Can be ‘median’ or ‘mean’. The default is ‘median’.

Returns:

The Median Absolute Deviation value.

Return type:

float

Notes

When mode is ‘median’: MAD = median(|data - median(data)|) When mode is ‘mean’: MAD = mean(|data - mean(data)|) NaN values are ignored in the calculation.

References

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

geodezyx.stats.stats.movingaverage(values, window)

Calculate moving average using convolution (INTERNAL_ID_1).

Parameters:
  • values (array-like) – Input values.

  • window (int) – Window size for the moving average.

Returns:

smas – Moving average calculated with ‘valid’ mode. The output size will be len(values) - window + 1.

Return type:

numpy.ndarray

Notes

The ‘valid’ mode requires enough datapoints. For example, if you remove ‘valid’, it will start at point one without any prior points. This implementation returns a shorter array than the input.

References

http://sentdex.com/sentiment-analysisbig-data-and-python-tutorials-algorithmic-trading/how-to-chart-stocks-and-forex-doing-your-own-financial-charting/calculate-simple-moving-average-sma-python/

geodezyx.stats.stats.movingaverage_bis(interval, window_size, convolve_mode='same')

Calculate moving average using convolution with specified mode (INTERNAL_ID_4).

Parameters:
  • interval (array-like) – Input values.

  • window_size (int) – Size of the convolution window.

  • convolve_mode (str, optional) – Mode for convolution: ‘same’, ‘valid’, or ‘full’. The default is ‘same’, which returns output with same size as input.

Returns:

Moving average with specified convolution mode applied.

Return type:

numpy.ndarray

Notes

Slower than other methods but gives output of the same size as input. The ‘same’ mode is recommended for most applications.

References

https://stackoverflow.com/questions/11352047/finding-moving-average-from-data-points-in-python

geodezyx.stats.stats.movingaverage_ter(data, window_width)

Calculate moving average using cumulative sum (INTERNAL_ID_5).

Parameters:
  • data (array-like) – Input values.

  • window_width (int) – Width of the moving window.

Returns:

ma_vec – Moving average vector with reduced size compared to input.

Return type:

numpy.ndarray

Notes

Returns output shorter than the input.

References

https://stackoverflow.com/questions/11352047/finding-moving-average-from-data-points-in-python

geodezyx.stats.stats.outlier_above_below(X, threshold_values, reference=<function nanmean>, theshold_absolute=True, return_booleans=True, theshold_relative_value='reference', verbose=False)

Gives values of X which are between threshold values

Parameters:
  • X (array) – Input array

  • threshold_values (single value (float) or a 2-tuple) –

    (lower bound threshold, upper bound threshold)

    WARN: those value(s) have to be positives. Minus sign for lower bound and plus sign for upper one will be applied internally

  • reference (float or callable) – the central reference value can be a absolute fixed value (float) or a function (e.g. np.mean of np.median)

  • theshold_absolute (bool) –

    if True threshold_values are absolutes values
    >>> low = reference - threshold_values[0]
    >>> upp = reference + threshold_values[1]
    
    if False they are fractions of theshold_relative_value
    >>> low = reference - threshold_values[0] * theshold_relative_value
    >>> upp = reference + threshold_values[1] * theshold_relative_value
    

    (see also below)

  • theshold_relative_value (str or function) – if the string “reference” or None is given, then it the reference value which is used if it is a fuction (e.g. np.std()) then it is this value returned by this function which is used Only useful when theshold_absolute = False

  • return_booleans (bool) – return booleans or not

  • verbose (bool) – print debug information

Returns:

  • Xout (numpy array) – X between low_bound & upp_bound

  • bbool (numpy array) – X-sized array of booleans

geodezyx.stats.stats.outlier_above_below_binom(Y, X, threshold_values, reference=<function nanmean>, theshold_absolute=True, theshold_relative_value='reference', return_booleans=False, detrend_first=True, verbose=False)

Gives values of Y which are between threshold values, and correct an associated X so as X => Y(X)

Parameters:
  • Y (array) – The dependent variable

  • X (array) – The independent variable

  • threshold_values (single value (float) or a 2-tuple) – (lower bound threshold, upper bound threshold)

  • reference (float or callable) – the central reference value

  • theshold_absolute (bool) – if True threshold_values are absolutes values

  • theshold_relative_value (str or function) – relative threshold value specification

  • return_booleans (bool) – return booleans or not

  • detrend_first (bool) – remove trend before outlier detection

  • verbose (bool) – print debug information

Returns:

  • Yout (array) – Filtered Y values

  • Xout (array) – Corresponding filtered X values

  • Xout (array) – Corresponding filtered X values WARN : those value(s) have to be positives. Minus sign for lower bound and plus sign for upper one will be applied internally

  • reference (float or callable) – the central reference value can be a absolute fixed value (float) or a function (e.g. np.mean of np.median)

  • theshold_absolute (bool) – if True threshold_values are absolutes values >>> low = reference - threshold_values[0] >>> upp = reference + threshold_values[1] if False they are fractions of theshold_relative_value >>> low = reference - threshold_values[0] * theshold_relative_value >>> upp = reference + threshold_values[1] * theshold_relative_value (see also below)

  • theshold_relative_value (str or function) – if the string “reference” or None is given, then it the reference value which is used if it is a fuction (e.g. np.std()) then it is this value returned by this function which is used Only useful when theshold_absolute = False

  • detrend_first (bool) – detrend linear behavior of Y(X) first Recommended

  • return_booleans (bool) – return booleans or not

  • verbose (bool)

Returns:

  • Xout (numpy array) – X between low_bound & upp_bound

  • bbool (numpy array) – X-sized array of booleans

geodezyx.stats.stats.outlier_above_below_simple(X, low_bound, upp_bound, return_booleans=True)

Gives values of X which are between low_bound & upp_bound

Parameters:
  • X (list or numpy.array) – Values

  • upp_bound (low_bound &) – lower and upper bound of X values wished

  • return_booleans (bool) – return booleans or not

Returns:

  • Xout (numpy.array) – X between low_bound & upp_bound

  • bbool (bool) – X-sized array of booleans

geodezyx.stats.stats.outlier_mad(data, threshold=3.5, verbose=False, convert_to_np_array=True, mad_mode='median', seuil=None)

clean the outlier of Ya dataset using the MAD approach

Parameters:
  • data (list or numpy.array) – Values

  • threshold (float) – MAD threshold

  • verbose (bool)

  • convert_to_np_array (bool) – if True returns output as an array, if False as a regular list

  • mad_mode (str) – ‘median’ or ‘mean’ : MAD can also be based on mean (for experimental purposes)

  • seuil (float, optional) – legacy name of ‘threshold’ argument. will override threshold value if given

Returns:

  • dataout (numpy.array) – Values cleaned of outliers

  • boolbad (numpy.array) – Y-sized booleans

  • Source

  • ——

  • Utilisation de la MAD pour detecter les outliers

  • http (//www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm)

  • http (//web.ipac.caltech.edu/staff/fmasci/home/statistics_refs/BetterThanMAD.pdf)

geodezyx.stats.stats.outlier_mad_binom(Y, X, threshold=3.5, verbose=False, detrend_first=False, return_booleans=False)

clean the outlier of Y using the MAD approach and clean the corresponding values in X assuming that we have the function : X => Y(X) (be carefull, Y is the first argument)

Parameters:
  • Y (list or numpy.array) – Values

  • X (list or numpy.array) – X Values so as X => Y(X)

  • threshold (float) – MAD threshold

  • verbose (bool)

  • detrend_first (bool) – detrend linear behavior of Y(X) first

  • return_booleans (bool) – return good and bad values of Y and X as booleans

Returns:

  • Yclean & Xclean (numpy.array)

  • bb (numpy.array (if return_booleans == True)) – Y-sized booleans

geodezyx.stats.stats.outlier_mad_binom_legacy(X, Y, threshold=3.5, verbose=False, detrend_first=False, return_booleans=False)

clean the outlier of X and clean the corresponding values in Y

legacy : order of X Y is different than in the main version, and here it might be unstable for the detrend

geodezyx.stats.stats.outlier_sigma(datasigmain, threshold=3)

Remove outliers from sigma values above a threshold.

This is a very old and discontinued function, not very efficient.

Parameters:
  • datasigmain (array-like) – Input sigma values.

  • threshold (float, optional) – Threshold multiplier of median sigma. The default is 3.

Returns:

  • datasigmaout (numpy.ndarray) – Filtered sigma values.

  • boolbad (numpy.ndarray) – Boolean array indicating which values were kept (not outliers).

Notes

This function is deprecated and not recommended for new code. A point with sigma > threshold * median(sigmas) is considered an outlier.

Deprecation

This is a very old and discontinued function.

geodezyx.stats.stats.rms_mean(A)

Calculate the root mean square (RMS) mean of a list or array.

Parameters:

A (array-like) – Input values.

Returns:

The RMS mean value.

Return type:

float

Notes

The RMS mean is defined as: sqrt(mean(A^2)) NaN values are ignored in the calculation.

geodezyx.stats.stats.rms_mean_alternativ(A)

Calculate the “GRGS style” RMS of a list or array.

This calculates the standard deviation (RMS of deviations from the mean).

Parameters:

A (array-like) – Input values.

Returns:

The RMS value of deviations from the mean.

Return type:

float

Notes

This function calculates: sqrt(mean((A - mean(A))^2)) which is essentially the standard deviation. As of 1808, this is basically the standard deviation.

geodezyx.stats.stats.rms_mean_kouba(A, multipl_coef=3, deg_of_freedom=7)

Calculate RMS mean following Kouba’s method.

Parameters:
  • A (array-like) – Input values.

  • multipl_coef (int, optional) – Multiplication coefficient. The default is 3.

  • deg_of_freedom (int, optional) – Degrees of freedom. The default is 7.

Returns:

The RMS value calculated using Kouba’s method.

Return type:

float

Notes

Formula: sqrt(sum(A^2) / (multipl_coef * len(A) - deg_of_freedom))

geodezyx.stats.stats.runningMean(x, N)

Calculate running mean (INTERNAL_ID_2).

Parameters:
  • x (array-like) – Input values.

  • N (int) – Window size.

Returns:

y – Running mean with same size as input.

Return type:

numpy.ndarray

Notes

This implementation returns an output with the same size as input, but the result is shifted. Y[0] should be aligned with the middle of the first window.

References

http://stackoverflow.com/questions/13728392/moving-average-or-running-mean

geodezyx.stats.stats.running_mean(data_in, window, convolve_mode='same')

Gives running mean / moving average of data

Parameters:
  • data_in (list or numpy.array) – Values

  • window (float or int) – Size of the window for the running mean

  • convolve_mode (str) – (expert) mode for the underlying convolution

Returns:

data_run – running mean of data_in (sane size as data_in) should stay “same”

Return type:

numpy.array

Note

Nota :

After a stress test, this one is the only one to provide an output with same size as input AND not shifted This fct is slow but at leat, do the job

See running_mean_help for more details

convolve_mode should stay fixed as “same”

Nota 2 (for developpers) :

Wrapper based on fct movingaverage_bis

The substraction of the mean is an empirical trick

geodezyx.stats.stats.running_mean_core(x, N)

Calculate running mean using cumulative sum (INTERNAL_ID_3).

Parameters:
  • x (array-like) – Input values.

  • N (int) – Window size.

Returns:

xout – Running mean with reduced size.

Return type:

numpy.ndarray

Notes

More efficient than runningMean but returns output shorter than input.

References

https://stackoverflow.com/questions/13728392/moving-average-or-running-mean

geodezyx.stats.stats.running_mean_help()
geodezyx.stats.stats.sinusoide(T, A, omega, phi=0, f=None)

produce a sinusoidal waveform

Parameters:
  • T (float) – time variable.

  • A (float) – amplitude, the peak deviation of the function from zero.

  • omega (float, optional) – ω = 2πf, angular frequency, the rate of change of the function argument in units of radians per second.

  • phi (float, optional) – phase, specifies (in radians) where in its cycle the oscillation is at t = 0. The default is 0.

  • f (float) – ordinary frequency, the number of oscillations (cycles) that occur each second of time. If given, it overrides the angular frequency omega. Thus, to use it, declare also omega = 0 The default is None.

Returns:

a sinusoidal waveform.

Return type:

float

Notes

https://en.wikipedia.org/wiki/Sine_wave

geodezyx.stats.stats.smooth(x, window_len=11, window='hanning')

Smooth the data using a window with requested size.

This method is based on the convolution of a scaled window with the signal. The signal is prepared by introducing reflected copies of the signal (with the window size) in both ends so that transient parts are minimized in the beginning and end part of the output signal.

Note: works only for equally spaced data

Parameters:
  • x (array) – the input signal

  • window_len (int) – the dimension of the smoothing window; should be an odd integer

  • window (str) – the type of window from ‘flat’, ‘hanning’, ‘hamming’, ‘bartlett’, ‘blackman’ flat window will produce a moving average smoothing.

Returns:

the smoothed signal

Return type:

array

Examples

>>> t = np.linspace(-2, 2, 0.1)
>>> x = np.sin(t) + np.random.randn(len(t))*0.1
>>> y = smooth(x)

See also

numpy.hanning, numpy.hamming, numpy.bartlett, numpy.blackman, numpy.convolve, scipy.signal.lfilter

Notes

The window parameter could be the window itself if an array instead of a string length(output) != length(input), to correct this: return y[(window_len/2-1):-(window_len/2)] instead of just y.

References

http://scipy-cookbook.readthedocs.io/items/SignalSmooth.html

geodezyx.stats.stats.time_win_basic(start, end, Tlisin, Datalisin, outposix=True, invert=False, out_array=False, out_boolis=False, only_boolis=False)

Filter time series data within a specified time window.

Parameters:
  • start (datetime or float) – Start time of the window (POSIX if float).

  • end (datetime or float) – End time of the window (POSIX if float).

  • Tlisin (array-like of datetime or float) – Time values of the input time series.

  • Datalisin (array-like) – Data values of the input time series.

  • outposix (bool, optional) – If True, output times are in POSIX format. The default is True.

  • invert (bool, optional) – If True, invert the boolean mask (select values outside the window). The default is False.

  • out_array (bool, optional) – If True, convert outputs to numpy arrays. The default is False.

  • out_boolis (bool, optional) – If True, also return the boolean mask. The default is False.

  • only_boolis (bool, optional) – If True, only compute and return the boolean mask (faster). The default is False.

Returns:

  • Tlisout (array or None) – Time values within the window (or outside if invert=True). None if only_boolis=True.

  • Datalisout (array or None) – Data values within the window (or outside if invert=True). None if only_boolis=True.

  • boolis (numpy.ndarray, optional) – Boolean mask array if out_boolis=True or only_boolis=True.

Notes

Internally works with POSIX time format.

geodezyx.stats.stats.time_win_multi(start, end, Tlist, Datalislis, outposix=True, invert=False, out_array=False)

Filter multiple time series data within a specified time window.

Parameters:
  • start (datetime or float) – Start time of the window.

  • end (datetime or float) – End time of the window.

  • Tlist (array-like) – Time values (common to all data series).

  • Datalislis (list of array-like) – List of data arrays to filter.

  • outposix (bool, optional) – If True, output times are in POSIX format. The default is True.

  • invert (bool, optional) – If True, invert the selection. The default is False.

  • out_array (bool, optional) – If True, convert outputs to numpy arrays. The default is False.

Returns:

  • Tlisout (array) – Time values within the window.

  • Datalislisout (list of arrays) – Filtered data arrays.

See also

time_win_basic

Filter a single time series.

geodezyx.stats.stats.time_win_multi_start_end(Start_list_in, End_list_in, Tlisin, Datalisin, outposix=True, invert=False, out_array=False, out_boolis=False)

Filter time series data within multiple specified time windows.

All windows must be satisfied simultaneously (logical AND operation).

Parameters:
  • Start_list_in (array-like) – List of start times for each window.

  • End_list_in (array-like) – List of end times for each window.

  • Tlisin (array-like) – Time values of the input time series.

  • Datalisin (array-like) – Data values of the input time series.

  • outposix (bool, optional) – If True, output times are in POSIX format. The default is True.

  • invert (bool, optional) – If True, invert the selection. The default is False.

  • out_array (bool, optional) – If True, convert outputs to numpy arrays. The default is False.

  • out_boolis (bool, optional) – If True, also return boolean masks. The default is False.

Returns:

  • Tlisout (array) – Time values satisfying all windows.

  • Datalisout (array) – Data values satisfying all windows.

  • boolis_opera (numpy.ndarray, optional) – Combined boolean mask if out_boolis=True.

  • boolis_stk (numpy.ndarray, optional) – Stack of individual window boolean masks if out_boolis=True.

Notes

Internally works in POSIX time format. A point is included only if it falls within ALL specified time windows (AND operation).

See also

time_win_basic

Filter using a single time window.

geodezyx.stats.stats.wrapTo180(lon)

Wrap longitude values to the range [-180, 180).

Parameters:

lon (float or array-like) – Longitude value(s) in degrees.

Returns:

Wrapped longitude value(s).

Return type:

float or numpy.ndarray

Notes

This function is based on MATLAB’s wrapTo180 function.

geodezyx.stats.stats.wrapTo360(lon)

Wrap longitude values to the range [0, 360).

Parameters:

lon (float or array-like) – Longitude value(s) in degrees.

Returns:

Wrapped longitude value(s).

Return type:

float or numpy.ndarray

Notes

This function is based on MATLAB’s wrapTo360 function.