From 28a61654f950cc8e6a1ec684e433302167ad581c Mon Sep 17 00:00:00 2001 From: akx Date: Tue, 27 Sep 2022 11:15:37 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20koodikli?= =?UTF-8?q?nikka/palkkakysely@d3cf581c2b3d609e4410de12aa78a8e182eebd23=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 2021/charts.html | 6 +- 2021/data.xlsx | Bin 48353 -> 48354 bytes 2021/index.html | 6 +- 2021/profiling_report.html | 2254 ++++++++++++++++++------------------ 2021/raw.xlsx | Bin 53403 -> 53403 bytes 5 files changed, 1133 insertions(+), 1133 deletions(-) diff --git a/2021/charts.html b/2021/charts.html index b4e2d22..cf6c101 100644 --- a/2021/charts.html +++ b/2021/charts.html @@ -9,10 +9,10 @@ -
+
2022-09-26T14:12:59.980272image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/2022-09-27T11:15:21.567035image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-26T14:13:00.126482image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-27T11:15:21.732663image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-26T14:13:00.253171image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-27T11:15:21.859905image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-26T14:13:00.384631image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-27T11:15:21.992605image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-26T14:13:00.531667image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-27T11:15:22.140384image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-26T14:12:56.051172image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-27T11:15:17.639322image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-26T14:12:56.295729image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-27T11:15:17.887252image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-26T14:12:56.517987image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-27T11:15:18.117403image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-26T14:12:56.711151image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-27T11:15:18.312983image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sanaKk-tulot
02021-02-15 11:57:08.316PK-Seutu33NaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN6916.666667
12021-02-15 11:57:19.676Turku33mies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN5208.333333
22021-02-15 11:58:03.592PK-Seutu28mies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN2500.000000
32021-02-15 11:58:15.261Tampere33mies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN8333.333333
42021-02-15 11:58:16.983PK-Seutu28mies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN3125.000000
52021-02-15 11:58:49.454PK-Seutu43mies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto8000.0100000.0TrueNaNNaN8333.333333
62021-02-15 12:00:03.771PK-Seutu33mies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN11666.666667
72021-02-15 12:00:04.655Tampere33NaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto4250.054000.0TrueNaNNaN4500.000000
82021-02-15 12:01:00.769Tampere33mies6.0Työntekijä / palkollinen1.0Lead developerToimisto4000.050000.0FalseNaNNaN4166.666667
92021-02-15 12:02:03.577Tallinna33mies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN16666.666667

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sanaKk-tulot
4902021-02-25 21:17:36.323PK-Seutu33mies10.0Työntekijä / palkollinen1.0Full-stack ohjemistokehittäjäToimisto4600.058000.0TrueNaNNaN4833.333333
4912021-02-26 09:32:59.778Oulu48mies21.0Työntekijä / palkollinen1.0Backend-koodariEtä5000.070000.0TrueNokiaNaN5833.333333
4922021-02-26 12:16:19.696Tampere38mies15.0Työntekijä / palkollinen1.0OhjelmistosuunnittelijaToimisto4300.053750.0FalseGoforeNaN4479.166667
4932021-02-26 12:21:52.296Tampere33mies11.0Freelancer1.0frontendEtäNaN157300.0TrueNaNNaN13108.333333
4942021-02-26 12:46:37.404PK-Seutu33mies11.0Työntekijä / palkollinen1.0ArkkitehtiToimisto6500.081250.0TrueSiiliNaN6770.833333
4952021-02-26 12:47:26.116PK-Seutu33nainen3.0Työntekijä / palkollinen1.0Full-stack50/503800.0NaNFalseNaNNaNNaN
4962021-02-26 13:24:35.647PK-Seutu33miesNaNTyöntekijä / palkollinen1.0Ohjelmistokehittäjä50/50NaN75000.0TrueVincitNaN6250.000000
4972021-02-26 16:28:30.010Tampere43mies20.0Työntekijä / palkollinen1.0full-stackToimisto4800.061000.0TrueNaNNaN5083.333333
4982021-02-27 12:38:00.760Tampere33mies9.0Työntekijä / palkollinen1.0backend ja devopsEtä4270.054000.0FalseNaNNaN4500.000000
4992021-02-27 17:49:24.789Kouvola33mies2.0Työntekijä / palkollinen1.0Full-stack OhjelmistosuunnittelijaEtä2800.035000.0FalseNaNNaN2916.666667