From 1f33e37fa80ccaf352a78eabdce99c32fef305f0 Mon Sep 17 00:00:00 2001 From: akx Date: Mon, 22 Feb 2021 12:20:00 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20koodikli?= =?UTF-8?q?nikka/palkkakysely@ac933d101db16236a47b0eb511fbfd41125f9974=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- charts.html | 6 +- data.csv | 8 + data.html | 136 + data.json | 2 +- data.xlsx | Bin 44835 -> 45548 bytes index.html | 10 +- profiling_report.html | 7119 ++++++++++++++++++++--------------------- raw.tsv | 10 +- raw.xlsx | Bin 37384 -> 51095 bytes 9 files changed, 3558 insertions(+), 3733 deletions(-) diff --git a/charts.html b/charts.html index beedd48..139edd1 100644 --- a/charts.html +++ b/charts.html @@ -34,14 +34,14 @@ -
+
2021-02-22T10:06:35.764350image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/2021-02-22T12:19:57.280966image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-22T10:06:35.919432image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-22T12:19:57.467853image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-22T10:06:36.074188image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-22T12:19:57.653498image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-22T10:06:36.237500image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-22T12:19:57.842620image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-22T10:06:30.379855image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-22T12:19:50.192281image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-22T10:06:30.682098image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-22T12:19:50.604525image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-22T10:06:30.980798image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-22T12:19:50.973275image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-22T10:06:31.253959image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-22T12:19:51.337327image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sanaKk-tulot
02021-02-15 11:57:08.316PK-Seutu48NaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN6916.666667
12021-02-15 11:57:19.676Turku48mies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN5208.333333
22021-02-15 11:58:03.592PK-Seutu41mies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN2500.000000
32021-02-15 11:58:15.261Tampere48mies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN8333.333333
42021-02-15 11:58:16.983PK-Seutu41mies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN3125.000000
52021-02-15 11:58:49.454PK-Seutu64mies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto8000.0100000.0TrueNaNNaN8333.333333
62021-02-15 12:00:03.771PK-Seutu48mies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN11666.666667
72021-02-15 12:00:04.655Tampere48NaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto4250.054000.0TrueNaNNaN4500.000000
82021-02-15 12:01:00.769Tampere48mies6.0Työntekijä / palkollinen1.0Lead developerToimisto4000.050000.0FalseNaNNaN4166.666667
92021-02-15 12:02:03.577Tallinna48mies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN16666.666667

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sanaKk-tulot
4502021-02-21 17:09:05.499PK-Seutu48mies10.0Työntekijä / palkollinen1.0data engineering, team leadEtä5300.071500.0FalseNaNNaN5958.333333
4512021-02-21 18:34:07.903PK-Seutu34mies1.0Työntekijä / palkollinen1.0FrontendToimisto2600.031200.0FalseNaNNaN2600.000000
4522021-02-21 23:03:57.647PK-Seutu56mies22.0Yrittäjä1.0Full-stackToimisto5000.085000.0TrueNaNNaN7083.333333
4532021-02-22 07:33:10.449Hämeenlinna48NaN5.0Työntekijä / palkollinen0.8OhjelmistokehittäjäEtä2400.025000.0FalseNaNNaN2083.333333
4542021-02-22 07:47:19.579PK-Seutu56mies12.0Työntekijä / palkollinen1.0SovelluskehittäjäToimisto6000.075000.0FalseNaNPieni firma ja paljon hattuja päässä. Palkka on hyvä, mutta ei korvaa stressiä ja painetta.6250.000000
4552021-02-22 09:49:11.345Lontoo56mies17.0Työntekijä / palkollinen1.0CTOEtä8500.0200000.0TrueNaNNaN16666.666667
4562021-02-22 10:02:50.113PK-Seutu48mies3.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3200.040000.0FalseSiili Solutions OyjNaN3333.333333
4572021-02-22 10:36:42.074PK-Seutu48mies20.0Yrittäjä1.0CTOToimisto4000.050000.0FalseNaNhyvä kysely4166.666667
4582021-02-22 11:03:33.749Tampere56mies10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto3858.048225.0TrueWakeoneNaN4018.750000
4592021-02-22 11:05:29.788PK-Seutu56nainen12.0Työntekijä / palkollinen1.0Myynnistä vastaava50/508200.0100000.0TrueNaNNaN8333.333333