From f39b82eac5ed6ea6e2ed487e6d0842b57d11f8c2 Mon Sep 17 00:00:00 2001 From: akx Date: Fri, 19 Feb 2021 14:51:46 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20koodikli?= =?UTF-8?q?nikka/palkkakysely@41a54d0c6cfa9f90cefd5cd4b5dec420205c3959=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- charts.html | 6 +- data.csv | 1 + data.html | 16 + data.json | 2 +- data.xlsx | Bin 39390 -> 39459 bytes index.html | 6 +- profiling_report.html | 3437 ++++++++++++++++++++--------------------- raw.tsv | 3 +- raw.xlsx | Bin 35019 -> 35082 bytes 9 files changed, 1712 insertions(+), 1759 deletions(-) diff --git a/charts.html b/charts.html index 053ecaa..9ed83d6 100644 --- a/charts.html +++ b/charts.html @@ -34,14 +34,14 @@ -
+
2021-02-19T14:47:46.013551image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/2021-02-19T14:51:43.422286image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-19T14:47:46.185837image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-19T14:51:43.619401image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-19T14:47:46.360926image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-19T14:51:43.816061image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-19T14:47:46.548399image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-19T14:51:44.022189image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-19T14:47:39.819441image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-19T14:51:36.312343image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-19T14:47:40.156388image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-19T14:51:36.713803image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-19T14:47:40.649173image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-19T14:51:37.204624image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-19T14:47:40.931456image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-19T14:51:37.528472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
02021-02-15 11:57:08.316PK-Seutu31-35 vNaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN
12021-02-15 11:57:19.676Turku31-35 vmies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN
22021-02-15 11:58:03.592PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN
32021-02-15 11:58:15.261Tampere31-35 vmies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN
42021-02-15 11:58:16.983PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN
52021-02-15 11:58:49.454PK-Seutu41-45 vmies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN8000.0100000.0TrueNaNNaN
62021-02-15 12:00:03.771PK-Seutu31-35 vmies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN
72021-02-15 12:00:04.655Tampere31-35 vNaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4250.054000.0TrueNaNNaN
82021-02-15 12:01:00.769Tampere31-35 vmies6.0Työntekijä / palkollinen1.0Lead developerNaN4000.050000.0FalseNaNNaN
92021-02-15 12:02:03.577Tallinna31-35 vmies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
4142021-02-19 15:34:53.741Tampere26-30 vmuu7.0Työntekijä / palkollinen1.0Full-stack developer50/505550.069400.0TrueNaNNaN
4152021-02-19 15:40:16.336PK-Seutu26-30 vmies5.0Työntekijä / palkollinen0.8Full-stack/mobiili/designEtä7000.090000.0TrueMavericksNaN
4162021-02-19 16:04:50.348Tampere36-40 vmies16.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4800.065000.0TrueNaNBonukset riippuu firman tuloksesta. Palkka olisi varmastikin enemmän muualla mutta uskoakseni linjassa kollegoideni kanssa.
4172021-02-19 16:17:29.891PK-Seutu36-40 vnainen8.0Työntekijä / palkollinenNaNProduct Owner50/504500.056200.0TrueNaNNaN
4182021-02-19 16:26:32.700PK-Seutu36-40 vmies16.0Työntekijä / palkollinen1.0Mobile SWEtä8000.095000.0TrueMavericksNaN
4192021-02-19 16:33:27.762PK-Seutu31-35 vmies11.0Työntekijä / palkollinen1.0Full stack50/507000.087500.0TrueMavericksNaN
4202021-02-19 16:34:07.545PK-Seutu31-35 vmies12.0Työntekijä / palkollinen1.0full-stackEtä8000.095000.0TrueMavericksNaN
4212021-02-19 16:36:55.938Tampere41-45 vmies22.0Työntekijä / palkollinen0.8ohjelmistokehittäjä (backend) / arkkitehtiEtä4700.058750.0FalseNaNNaN
4222021-02-19 16:38:41.403PK-Seutu36-40 vmies2.0Työntekijä / palkollinen1.0WordPress-kehittäjä50/503000.037500.0FalseNaNNaN
4232021-02-19 16:39:14.831Tampere31-35 vmies5.0Työntekijä / palkollinen1.0Data scientistEtä4300.053750.0NaNWapiceNaN