Overview

Dataset statistics

Number of variables14
Number of observations424
Missing cells993
Missing cells (%)16.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory36.2 KiB
Average record size in memory87.5 B

Variable types

DateTime1
Categorical9
Numeric3
Boolean1

Warnings

Rooli has a high cardinality: 226 distinct values High cardinality
Työpaikka has a high cardinality: 67 distinct values High cardinality
Kilpailukykyinen is highly correlated with Vapaa sanaHigh correlation
Vapaa sana is highly correlated with Kilpailukykyinen and 1 other fieldsHigh correlation
Työpaikka is highly correlated with Vapaa sanaHigh correlation
Sukupuoli has 32 (7.5%) missing values Missing
Työaika has 16 (3.8%) missing values Missing
Rooli has 10 (2.4%) missing values Missing
Etä has 145 (34.2%) missing values Missing
Kuukausipalkka has 37 (8.7%) missing values Missing
Vuositulot has 11 (2.6%) missing values Missing
Kilpailukykyinen has 14 (3.3%) missing values Missing
Työpaikka has 326 (76.9%) missing values Missing
Vapaa sana has 391 (92.2%) missing values Missing
Vapaa sana is uniformly distributed Uniform
Timestamp has unique values Unique

Reproduction

Analysis started2021-02-19 14:47:37.610890
Analysis finished2021-02-19 14:47:41.080082
Duration3.47 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

Timestamp
Date

UNIQUE

Distinct424
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
Minimum2021-02-15 11:57:08.316000
Maximum2021-02-19 16:39:14.831000
2021-02-19T14:47:41.158582image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:47:41.324888image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Kaupunki
Categorical

Distinct25
Distinct (%)6.0%
Missing4
Missing (%)0.9%
Memory size1.3 KiB
PK-Seutu
215 
Tampere
99 
Turku
42 
Oulu
22 
Jyväskylä
 
17
Other values (20)
25 

Length

Max length15
Median length8
Mean length7.257142857
Min length2

Characters and Unicode

Total characters3048
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)4.3%

Sample

1st rowPK-Seutu
2nd rowTurku
3rd rowPK-Seutu
4th rowTampere
5th rowPK-Seutu
ValueCountFrequency (%)
PK-Seutu215
50.7%
Tampere99
23.3%
Turku42
 
9.9%
Oulu22
 
5.2%
Jyväskylä17
 
4.0%
Kuopio5
 
1.2%
Pori2
 
0.5%
Ruotsi1
 
0.2%
Wien1
 
0.2%
Viimsi1
 
0.2%
Other values (15)15
 
3.5%
(Missing)4
 
0.9%
2021-02-19T14:47:41.660676image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pk-seutu215
50.7%
tampere99
23.3%
turku42
 
9.9%
oulu22
 
5.2%
jyväskylä17
 
4.0%
kuopio5
 
1.2%
pori2
 
0.5%
viimsi1
 
0.2%
new1
 
0.2%
francisco1
 
0.2%
Other values (19)19
 
4.5%

Most occurring characters

ValueCountFrequency (%)
u570
18.7%
e422
13.8%
K223
 
7.3%
t220
 
7.2%
P218
 
7.2%
-217
 
7.1%
S217
 
7.1%
r147
 
4.8%
T142
 
4.7%
a116
 
3.8%
Other values (29)556
18.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1971
64.7%
Uppercase Letter855
28.1%
Dash Punctuation217
 
7.1%
Space Separator4
 
0.1%
Other Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
u570
28.9%
e422
21.4%
t220
 
11.2%
r147
 
7.5%
a116
 
5.9%
p105
 
5.3%
m103
 
5.2%
k60
 
3.0%
l47
 
2.4%
ä41
 
2.1%
Other values (10)140
 
7.1%
ValueCountFrequency (%)
K223
26.1%
P218
25.5%
S217
25.4%
T142
16.6%
O22
 
2.6%
J18
 
2.1%
E3
 
0.4%
L3
 
0.4%
V2
 
0.2%
W1
 
0.1%
Other values (6)6
 
0.7%
ValueCountFrequency (%)
-217
100.0%
ValueCountFrequency (%)
4
100.0%
ValueCountFrequency (%)
,1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2826
92.7%
Common222
 
7.3%

Most frequent character per script

ValueCountFrequency (%)
u570
20.2%
e422
14.9%
K223
 
7.9%
t220
 
7.8%
P218
 
7.7%
S217
 
7.7%
r147
 
5.2%
T142
 
5.0%
a116
 
4.1%
p105
 
3.7%
Other values (26)446
15.8%
ValueCountFrequency (%)
-217
97.7%
4
 
1.8%
,1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3007
98.7%
None41
 
1.3%

Most frequent character per block

ValueCountFrequency (%)
u570
19.0%
e422
14.0%
K223
 
7.4%
t220
 
7.3%
P218
 
7.2%
-217
 
7.2%
S217
 
7.2%
r147
 
4.9%
T142
 
4.7%
a116
 
3.9%
Other values (28)515
17.1%
ValueCountFrequency (%)
ä41
100.0%

Ikä
Categorical

Distinct7
Distinct (%)1.7%
Missing2
Missing (%)0.5%
Memory size908.0 B
31-35 v
142 
26-30 v
104 
36-40 v
92 
41-45 v
46 
21-25 v
26 
Other values (2)
 
12

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters2954
Distinct characters10
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row31-35 v
2nd row31-35 v
3rd row26-30 v
4th row31-35 v
5th row26-30 v
ValueCountFrequency (%)
31-35 v142
33.5%
26-30 v104
24.5%
36-40 v92
21.7%
41-45 v46
 
10.8%
21-25 v26
 
6.1%
46-50 v7
 
1.7%
51-55 v5
 
1.2%
(Missing)2
 
0.5%
2021-02-19T14:47:41.954523image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:47:42.051496image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
v422
50.0%
31-35142
 
16.8%
26-30104
 
12.3%
36-4092
 
10.9%
41-4546
 
5.5%
21-2526
 
3.1%
46-507
 
0.8%
51-555
 
0.6%

Most occurring characters

ValueCountFrequency (%)
3480
16.2%
-422
14.3%
422
14.3%
v422
14.3%
5236
8.0%
1219
7.4%
6203
6.9%
0203
6.9%
4191
 
6.5%
2156
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1688
57.1%
Dash Punctuation422
 
14.3%
Space Separator422
 
14.3%
Lowercase Letter422
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
3480
28.4%
5236
14.0%
1219
13.0%
6203
12.0%
0203
12.0%
4191
 
11.3%
2156
 
9.2%
ValueCountFrequency (%)
-422
100.0%
ValueCountFrequency (%)
422
100.0%
ValueCountFrequency (%)
v422
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2532
85.7%
Latin422
 
14.3%

Most frequent character per script

ValueCountFrequency (%)
3480
19.0%
-422
16.7%
422
16.7%
5236
9.3%
1219
8.6%
6203
8.0%
0203
8.0%
4191
 
7.5%
2156
 
6.2%
ValueCountFrequency (%)
v422
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2954
100.0%

Most frequent character per block

ValueCountFrequency (%)
3480
16.2%
-422
14.3%
422
14.3%
v422
14.3%
5236
8.0%
1219
7.4%
6203
6.9%
0203
6.9%
4191
 
6.5%
2156
 
5.3%

Sukupuoli
Categorical

MISSING

Distinct3
Distinct (%)0.8%
Missing32
Missing (%)7.5%
Memory size684.0 B
mies
356 
nainen
 
28
muu
 
8

Length

Max length6
Median length4
Mean length4.12244898
Min length3

Characters and Unicode

Total characters1616
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmies
2nd rowmies
3rd rowmies
4th rowmies
5th rowmies
ValueCountFrequency (%)
mies356
84.0%
nainen28
 
6.6%
muu8
 
1.9%
(Missing)32
 
7.5%
2021-02-19T14:47:42.358090image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:47:42.454742image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
mies356
90.8%
nainen28
 
7.1%
muu8
 
2.0%

Most occurring characters

ValueCountFrequency (%)
i384
23.8%
e384
23.8%
m364
22.5%
s356
22.0%
n84
 
5.2%
a28
 
1.7%
u16
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1616
100.0%

Most frequent character per category

ValueCountFrequency (%)
i384
23.8%
e384
23.8%
m364
22.5%
s356
22.0%
n84
 
5.2%
a28
 
1.7%
u16
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1616
100.0%

Most frequent character per script

ValueCountFrequency (%)
i384
23.8%
e384
23.8%
m364
22.5%
s356
22.0%
n84
 
5.2%
a28
 
1.7%
u16
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1616
100.0%

Most frequent character per block

ValueCountFrequency (%)
i384
23.8%
e384
23.8%
m364
22.5%
s356
22.0%
n84
 
5.2%
a28
 
1.7%
u16
 
1.0%

Työkokemus
Real number (ℝ≥0)

Distinct27
Distinct (%)6.4%
Missing4
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean9.616666667
Minimum0
Maximum30
Zeros3
Zeros (%)0.7%
Memory size3.4 KiB
2021-02-19T14:47:42.557795image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median9
Q313
95-th percentile21
Maximum30
Range30
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.067103545
Coefficient of variation (CV)0.6308946494
Kurtosis-0.02410700448
Mean9.616666667
Median Absolute Deviation (MAD)4
Skewness0.720885956
Sum4039
Variance36.80974543
MonotocityNot monotonic
2021-02-19T14:47:42.685064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
544
 
10.4%
1034
 
8.0%
428
 
6.6%
726
 
6.1%
2025
 
5.9%
324
 
5.7%
1523
 
5.4%
222
 
5.2%
1122
 
5.2%
1322
 
5.2%
Other values (17)150
35.4%
ValueCountFrequency (%)
03
 
0.7%
115
3.5%
222
5.2%
324
5.7%
428
6.6%
ValueCountFrequency (%)
302
 
0.5%
255
1.2%
242
 
0.5%
234
0.9%
224
0.9%
Distinct3
Distinct (%)0.7%
Missing1
Missing (%)0.2%
Memory size3.4 KiB
Työntekijä / palkollinen
378 
Freelancer
 
23
Yrittäjä
 
22

Length

Max length24
Median length24
Mean length22.40661939
Min length8

Characters and Unicode

Total characters9478
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTyöntekijä / palkollinen
2nd rowTyöntekijä / palkollinen
3rd rowTyöntekijä / palkollinen
4th rowYrittäjä
5th rowTyöntekijä / palkollinen
ValueCountFrequency (%)
Työntekijä / palkollinen378
89.2%
Freelancer23
 
5.4%
Yrittäjä22
 
5.2%
(Missing)1
 
0.2%
2021-02-19T14:47:42.986509image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:47:43.087330image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
378
32.1%
työntekijä378
32.1%
palkollinen378
32.1%
freelancer23
 
2.0%
yrittäjä22
 
1.9%

Most occurring characters

ValueCountFrequency (%)
n1157
12.2%
l1157
12.2%
e825
 
8.7%
i778
 
8.2%
k756
 
8.0%
756
 
8.0%
t422
 
4.5%
ä422
 
4.5%
a401
 
4.2%
j400
 
4.2%
Other values (10)2404
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7921
83.6%
Space Separator756
 
8.0%
Uppercase Letter423
 
4.5%
Other Punctuation378
 
4.0%

Most frequent character per category

ValueCountFrequency (%)
n1157
14.6%
l1157
14.6%
e825
10.4%
i778
9.8%
k756
9.5%
t422
 
5.3%
ä422
 
5.3%
a401
 
5.1%
j400
 
5.0%
y378
 
4.8%
Other values (5)1225
15.5%
ValueCountFrequency (%)
T378
89.4%
F23
 
5.4%
Y22
 
5.2%
ValueCountFrequency (%)
756
100.0%
ValueCountFrequency (%)
/378
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8344
88.0%
Common1134
 
12.0%

Most frequent character per script

ValueCountFrequency (%)
n1157
13.9%
l1157
13.9%
e825
9.9%
i778
9.3%
k756
9.1%
t422
 
5.1%
ä422
 
5.1%
a401
 
4.8%
j400
 
4.8%
T378
 
4.5%
Other values (8)1648
19.8%
ValueCountFrequency (%)
756
66.7%
/378
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8678
91.6%
None800
 
8.4%

Most frequent character per block

ValueCountFrequency (%)
n1157
13.3%
l1157
13.3%
e825
9.5%
i778
9.0%
k756
8.7%
756
8.7%
t422
 
4.9%
a401
 
4.6%
j400
 
4.6%
T378
 
4.4%
Other values (8)1648
19.0%
ValueCountFrequency (%)
ä422
52.8%
ö378
47.2%

Työaika
Categorical

MISSING

Distinct5
Distinct (%)1.2%
Missing16
Missing (%)3.8%
Memory size3.4 KiB
1.0
384 
0.8
 
20
0.5
 
2
0.6
 
1
0.7
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1224
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.5%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
1.0384
90.6%
0.820
 
4.7%
0.52
 
0.5%
0.61
 
0.2%
0.71
 
0.2%
(Missing)16
 
3.8%
2021-02-19T14:47:43.487584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:47:43.578650image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.0384
94.1%
0.820
 
4.9%
0.52
 
0.5%
0.61
 
0.2%
0.71
 
0.2%

Most occurring characters

ValueCountFrequency (%)
.408
33.3%
0408
33.3%
1384
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number816
66.7%
Other Punctuation408
33.3%

Most frequent character per category

ValueCountFrequency (%)
0408
50.0%
1384
47.1%
820
 
2.5%
52
 
0.2%
71
 
0.1%
61
 
0.1%
ValueCountFrequency (%)
.408
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1224
100.0%

Most frequent character per script

ValueCountFrequency (%)
.408
33.3%
0408
33.3%
1384
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1224
100.0%

Most frequent character per block

ValueCountFrequency (%)
.408
33.3%
0408
33.3%
1384
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Rooli
Categorical

HIGH CARDINALITY
MISSING

Distinct226
Distinct (%)54.6%
Missing10
Missing (%)2.4%
Memory size3.4 KiB
Ohjelmistokehittäjä
33 
full-stack
 
29
Full-stack
 
21
Arkkitehti
 
15
ohjelmistokehittäjä
 
14
Other values (221)
302 

Length

Max length67
Median length18
Mean length19.17874396
Min length2

Characters and Unicode

Total characters7940
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique182 ?
Unique (%)44.0%

Sample

1st rowArkkitehti
2nd rowfull-stack
3rd rowFull-stack ohjelmistokehittäjä
4th rowweb-arkkitehti
5th rowOhjelmistokehittäjä
ValueCountFrequency (%)
Ohjelmistokehittäjä33
 
7.8%
full-stack29
 
6.8%
Full-stack21
 
5.0%
Arkkitehti15
 
3.5%
ohjelmistokehittäjä14
 
3.3%
Full-stack ohjelmistokehittäjä8
 
1.9%
arkkitehti6
 
1.4%
full-stack ohjelmistokehittäjä6
 
1.4%
DevOps5
 
1.2%
Frontend5
 
1.2%
Other values (216)272
64.2%
(Missing)10
 
2.4%
2021-02-19T14:47:43.950157image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
full-stack126
 
16.5%
ohjelmistokehittäjä98
 
12.9%
developer52
 
6.8%
arkkitehti34
 
4.5%
30
 
3.9%
lead27
 
3.5%
frontend23
 
3.0%
senior18
 
2.4%
kehittäjä15
 
2.0%
backend14
 
1.8%
Other values (165)325
42.7%

Most occurring characters

ValueCountFrequency (%)
t831
 
10.5%
e726
 
9.1%
i585
 
7.4%
l581
 
7.3%
k448
 
5.6%
o413
 
5.2%
s380
 
4.8%
a379
 
4.8%
352
 
4.4%
h318
 
4.0%
Other values (47)2927
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6901
86.9%
Uppercase Letter392
 
4.9%
Space Separator353
 
4.4%
Dash Punctuation152
 
1.9%
Other Punctuation86
 
1.1%
Open Punctuation24
 
0.3%
Close Punctuation24
 
0.3%
Math Symbol8
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
t831
12.0%
e726
 
10.5%
i585
 
8.5%
l581
 
8.4%
k448
 
6.5%
o413
 
6.0%
s380
 
5.5%
a379
 
5.5%
h318
 
4.6%
j297
 
4.3%
Other values (16)1943
28.2%
ValueCountFrequency (%)
F89
22.7%
O80
20.4%
S44
11.2%
D39
9.9%
A24
 
6.1%
T20
 
5.1%
L16
 
4.1%
C13
 
3.3%
P10
 
2.6%
E10
 
2.6%
Other values (11)47
12.0%
ValueCountFrequency (%)
,50
58.1%
/32
37.2%
&3
 
3.5%
.1
 
1.2%
ValueCountFrequency (%)
352
99.7%
 1
 
0.3%
ValueCountFrequency (%)
-152
100.0%
ValueCountFrequency (%)
(24
100.0%
ValueCountFrequency (%)
)24
100.0%
ValueCountFrequency (%)
+8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7293
91.9%
Common647
 
8.1%

Most frequent character per script

ValueCountFrequency (%)
t831
 
11.4%
e726
 
10.0%
i585
 
8.0%
l581
 
8.0%
k448
 
6.1%
o413
 
5.7%
s380
 
5.2%
a379
 
5.2%
h318
 
4.4%
j297
 
4.1%
Other values (37)2335
32.0%
ValueCountFrequency (%)
352
54.4%
-152
23.5%
,50
 
7.7%
/32
 
4.9%
(24
 
3.7%
)24
 
3.7%
+8
 
1.2%
&3
 
0.5%
.1
 
0.2%
 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7640
96.2%
None300
 
3.8%

Most frequent character per block

ValueCountFrequency (%)
t831
 
10.9%
e726
 
9.5%
i585
 
7.7%
l581
 
7.6%
k448
 
5.9%
o413
 
5.4%
s380
 
5.0%
a379
 
5.0%
352
 
4.6%
h318
 
4.2%
Other values (44)2627
34.4%
ValueCountFrequency (%)
ä284
94.7%
ö15
 
5.0%
 1
 
0.3%

Etä
Categorical

MISSING

Distinct2
Distinct (%)0.7%
Missing145
Missing (%)34.2%
Memory size676.0 B
Etä
178 
50/50
101 

Length

Max length5
Median length3
Mean length3.724014337
Min length3

Characters and Unicode

Total characters1039
Distinct characters6
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50/50
2nd rowEtä
3rd rowEtä
4th rowEtä
5th rowEtä
ValueCountFrequency (%)
Etä178
42.0%
50/50101
23.8%
(Missing)145
34.2%
2021-02-19T14:47:44.316021image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:47:44.433977image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
etä178
63.8%
50/50101
36.2%

Most occurring characters

ValueCountFrequency (%)
5202
19.4%
0202
19.4%
E178
17.1%
t178
17.1%
ä178
17.1%
/101
9.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number404
38.9%
Lowercase Letter356
34.3%
Uppercase Letter178
17.1%
Other Punctuation101
 
9.7%

Most frequent character per category

ValueCountFrequency (%)
5202
50.0%
0202
50.0%
ValueCountFrequency (%)
t178
50.0%
ä178
50.0%
ValueCountFrequency (%)
/101
100.0%
ValueCountFrequency (%)
E178
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin534
51.4%
Common505
48.6%

Most frequent character per script

ValueCountFrequency (%)
5202
40.0%
0202
40.0%
/101
20.0%
ValueCountFrequency (%)
E178
33.3%
t178
33.3%
ä178
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII861
82.9%
None178
 
17.1%

Most frequent character per block

ValueCountFrequency (%)
5202
23.5%
0202
23.5%
E178
20.7%
t178
20.7%
/101
11.7%
ValueCountFrequency (%)
ä178
100.0%

Kuukausipalkka
Real number (ℝ≥0)

MISSING

Distinct117
Distinct (%)30.2%
Missing37
Missing (%)8.7%
Infinite0
Infinite (%)0.0%
Mean4672.372093
Minimum1666
Maximum15000
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2021-02-19T14:47:44.554764image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1666
5-th percentile2806
Q13825
median4500
Q35500
95-th percentile7000
Maximum15000
Range13334
Interquartile range (IQR)1675

Descriptive statistics

Standard deviation1322.304375
Coefficient of variation (CV)0.2830049382
Kurtosis8.972606581
Mean4672.372093
Median Absolute Deviation (MAD)750
Skewness1.49001451
Sum1808208
Variance1748488.861
MonotocityNot monotonic
2021-02-19T14:47:44.722342image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
400022
 
5.2%
450020
 
4.7%
600015
 
3.5%
500015
 
3.5%
550014
 
3.3%
480011
 
2.6%
700011
 
2.6%
430011
 
2.6%
420010
 
2.4%
41009
 
2.1%
Other values (107)249
58.7%
(Missing)37
 
8.7%
ValueCountFrequency (%)
16661
0.2%
17001
0.2%
18001
0.2%
21001
0.2%
22751
0.2%
ValueCountFrequency (%)
150001
 
0.2%
85001
 
0.2%
80005
1.2%
75002
 
0.5%
72001
 
0.2%

Vuositulot
Real number (ℝ≥0)

MISSING

Distinct168
Distinct (%)40.7%
Missing11
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean65207.78814
Minimum0
Maximum250000
Zeros2
Zeros (%)0.5%
Memory size3.4 KiB
2021-02-19T14:47:44.883841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile35000
Q150000
median60000
Q375000
95-th percentile120000
Maximum250000
Range250000
Interquartile range (IQR)25000

Descriptive statistics

Standard deviation28701.19644
Coefficient of variation (CV)0.4401498236
Kurtosis8.438798453
Mean65207.78814
Median Absolute Deviation (MAD)12000
Skewness2.181597667
Sum26930816.5
Variance823758677.4
MonotocityNot monotonic
2021-02-19T14:47:45.051871image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5500016
 
3.8%
6000014
 
3.3%
7500014
 
3.3%
5000014
 
3.3%
6500010
 
2.4%
625009
 
2.1%
850009
 
2.1%
800009
 
2.1%
520008
 
1.9%
400008
 
1.9%
Other values (158)302
71.2%
(Missing)11
 
2.6%
ValueCountFrequency (%)
02
0.5%
40001
0.2%
61001
0.2%
75001
0.2%
200001
0.2%
ValueCountFrequency (%)
2500001
 
0.2%
2000003
0.7%
1900001
 
0.2%
1800001
 
0.2%
1550001
 
0.2%

Kilpailukykyinen
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.5%
Missing14
Missing (%)3.3%
Memory size3.4 KiB
True
287 
False
123 
(Missing)
 
14
ValueCountFrequency (%)
True287
67.7%
False123
29.0%
(Missing)14
 
3.3%
2021-02-19T14:47:45.162100image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Työpaikka
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct67
Distinct (%)68.4%
Missing326
Missing (%)76.9%
Memory size3.4 KiB
Gofore
11 
Vincit
 
6
Fraktio
 
4
Futurice
 
4
Arado
 
3
Other values (62)
70 

Length

Max length132
Median length7
Mean length10.59183673
Min length2

Characters and Unicode

Total characters1038
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique56 ?
Unique (%)57.1%

Sample

1st rowQuestrade
2nd rowDigia Oyj
3rd rowGofore
4th rowOura Health
5th rowWirepas
ValueCountFrequency (%)
Gofore11
 
2.6%
Vincit6
 
1.4%
Fraktio4
 
0.9%
Futurice4
 
0.9%
Arado3
 
0.7%
Mavericks3
 
0.7%
Pankki3
 
0.7%
KVTES-alainen kunnan omistama oy 2
 
0.5%
Qvik2
 
0.5%
Siili2
 
0.5%
Other values (57)58
 
13.7%
(Missing)326
76.9%
2021-02-19T14:47:45.565487image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gofore13
 
8.4%
oy9
 
5.8%
vincit6
 
3.9%
fraktio4
 
2.6%
oyj4
 
2.6%
mavericks4
 
2.6%
futurice4
 
2.6%
pankki3
 
1.9%
siili3
 
1.9%
omistama3
 
1.9%
Other values (89)102
65.8%

Most occurring characters

ValueCountFrequency (%)
i104
 
10.0%
a82
 
7.9%
o79
 
7.6%
e73
 
7.0%
t71
 
6.8%
60
 
5.8%
r57
 
5.5%
n50
 
4.8%
l41
 
3.9%
u40
 
3.9%
Other values (43)381
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter842
81.1%
Uppercase Letter130
 
12.5%
Space Separator60
 
5.8%
Other Punctuation3
 
0.3%
Dash Punctuation3
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
G14
 
10.8%
O13
 
10.0%
V12
 
9.2%
S11
 
8.5%
F9
 
6.9%
A7
 
5.4%
K7
 
5.4%
C6
 
4.6%
P6
 
4.6%
E5
 
3.8%
Other values (15)40
30.8%
ValueCountFrequency (%)
i104
12.4%
a82
 
9.7%
o79
 
9.4%
e73
 
8.7%
t71
 
8.4%
r57
 
6.8%
n50
 
5.9%
l41
 
4.9%
u40
 
4.8%
k40
 
4.8%
Other values (15)205
24.3%
ValueCountFrequency (%)
60
100.0%
ValueCountFrequency (%)
.3
100.0%
ValueCountFrequency (%)
-3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin972
93.6%
Common66
 
6.4%

Most frequent character per script

ValueCountFrequency (%)
i104
 
10.7%
a82
 
8.4%
o79
 
8.1%
e73
 
7.5%
t71
 
7.3%
r57
 
5.9%
n50
 
5.1%
l41
 
4.2%
u40
 
4.1%
k40
 
4.1%
Other values (40)335
34.5%
ValueCountFrequency (%)
60
90.9%
.3
 
4.5%
-3
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1026
98.8%
None12
 
1.2%

Most frequent character per block

ValueCountFrequency (%)
i104
 
10.1%
a82
 
8.0%
o79
 
7.7%
e73
 
7.1%
t71
 
6.9%
60
 
5.8%
r57
 
5.6%
n50
 
4.9%
l41
 
4.0%
u40
 
3.9%
Other values (41)369
36.0%
ValueCountFrequency (%)
ä11
91.7%
ö1
 
8.3%

Vapaa sana
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct32
Distinct (%)97.0%
Missing391
Missing (%)92.2%
Memory size3.4 KiB
palkan lisänä lounas- ja virkistysetu
 
2
Ei sinänsä liity suoraan palkkoihin, mutta olisi mielenkiintoista tietää miten palkka vaikuttaa työpaikan vaihtoon. Eli esim. Oletko vaihtanut/vaihtamassa/miettinyt vaihtamista, koska toisaalla maksetaan enemmän?
 
1
saispa lisää liksaa
 
1
Työskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
 
1
Ihan OK. Edut myös kovat.
 
1
Other values (27)
27 

Length

Max length286
Median length71
Mean length95.54545455
Min length7

Characters and Unicode

Total characters3153
Distinct characters55
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)93.9%

Sample

1st rowKuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.
2nd rowTyöskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
3rd rowpalkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen.
4th rowRahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
5th rowOsittain laskutukseen perustuva palkka joten vaihtelee.
ValueCountFrequency (%)
palkan lisänä lounas- ja virkistysetu2
 
0.5%
Ei sinänsä liity suoraan palkkoihin, mutta olisi mielenkiintoista tietää miten palkka vaikuttaa työpaikan vaihtoon. Eli esim. Oletko vaihtanut/vaihtamassa/miettinyt vaihtamista, koska toisaalla maksetaan enemmän?1
 
0.2%
saispa lisää liksaa1
 
0.2%
Työskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.1
 
0.2%
Ihan OK. Edut myös kovat.1
 
0.2%
Palkka perustuu osittain laskutukseen, joten vuositulot vaihtelevat hieman.1
 
0.2%
Työskentelen opintojen ohella, ensimmäisessä frontend devaajan työssä. Olen opiskellut reilu 2 vuotta yliopistossa. Palkkani on mielestäni nyt ihan ok, mutta tarkoituksena nostaa sitä 3000e /kk loppukesään mennessä. 1
 
0.2%
Korona-aika on lisännyt etätyön määrää. Aiemmin pari päivää viikossa etänä, nyt kokonaan. Paluuta vanhaan ei varmaankaan ole, ehkä päivä viikossa konttorilla ihan sosiaalisten kontaktien takia.1
 
0.2%
Halpaa freelancer laskutusta oman tuotekehityksen sivussa1
 
0.2%
Bonukset riippuu firman tuloksesta. Palkka olisi varmastikin enemmän muualla mutta uskoakseni linjassa kollegoideni kanssa.1
 
0.2%
Other values (22)22
 
5.2%
(Missing)391
92.2%
2021-02-19T14:47:45.881177image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ei10
 
2.5%
palkka9
 
2.3%
on8
 
2.0%
ole6
 
1.5%
mutta6
 
1.5%
ja6
 
1.5%
joten4
 
1.0%
palkan4
 
1.0%
nyt4
 
1.0%
firman4
 
1.0%
Other values (281)334
84.6%

Most occurring characters

ValueCountFrequency (%)
365
11.6%
a331
10.5%
i271
 
8.6%
t247
 
7.8%
n216
 
6.9%
s205
 
6.5%
e203
 
6.4%
k185
 
5.9%
l161
 
5.1%
o146
 
4.6%
Other values (45)823
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2625
83.3%
Space Separator365
 
11.6%
Other Punctuation74
 
2.3%
Uppercase Letter47
 
1.5%
Decimal Number27
 
0.9%
Dash Punctuation6
 
0.2%
Open Punctuation3
 
0.1%
Close Punctuation3
 
0.1%
Math Symbol3
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
a331
12.6%
i271
10.3%
t247
9.4%
n216
 
8.2%
s205
 
7.8%
e203
 
7.7%
k185
 
7.0%
l161
 
6.1%
o146
 
5.6%
u118
 
4.5%
Other values (14)542
20.6%
ValueCountFrequency (%)
T7
14.9%
P7
14.9%
O6
12.8%
E6
12.8%
V6
12.8%
S4
8.5%
K3
6.4%
I2
 
4.3%
H2
 
4.3%
R1
 
2.1%
Other values (3)3
6.4%
ValueCountFrequency (%)
015
55.6%
13
 
11.1%
52
 
7.4%
22
 
7.4%
82
 
7.4%
62
 
7.4%
31
 
3.7%
ValueCountFrequency (%)
.38
51.4%
,23
31.1%
/5
 
6.8%
%4
 
5.4%
"2
 
2.7%
?2
 
2.7%
ValueCountFrequency (%)
365
100.0%
ValueCountFrequency (%)
(3
100.0%
ValueCountFrequency (%)
)3
100.0%
ValueCountFrequency (%)
+3
100.0%
ValueCountFrequency (%)
-6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2672
84.7%
Common481
 
15.3%

Most frequent character per script

ValueCountFrequency (%)
a331
12.4%
i271
10.1%
t247
9.2%
n216
 
8.1%
s205
 
7.7%
e203
 
7.6%
k185
 
6.9%
l161
 
6.0%
o146
 
5.5%
u118
 
4.4%
Other values (27)589
22.0%
ValueCountFrequency (%)
365
75.9%
.38
 
7.9%
,23
 
4.8%
015
 
3.1%
-6
 
1.2%
/5
 
1.0%
%4
 
0.8%
(3
 
0.6%
)3
 
0.6%
+3
 
0.6%
Other values (8)16
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3017
95.7%
None136
 
4.3%

Most frequent character per block

ValueCountFrequency (%)
365
12.1%
a331
11.0%
i271
 
9.0%
t247
 
8.2%
n216
 
7.2%
s205
 
6.8%
e203
 
6.7%
k185
 
6.1%
l161
 
5.3%
o146
 
4.8%
Other values (43)687
22.8%
ValueCountFrequency (%)
ä112
82.4%
ö24
 
17.6%

Interactions

2021-02-19T14:47:38.856493image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:47:39.001979image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:47:39.140280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:47:39.279210image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:47:39.419849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:47:39.555348image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-02-19T14:47:46.013551image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-19T14:47:46.185837image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-19T14:47:46.360926image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-19T14:47:46.548399image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-19T14:47:39.819441image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-19T14:47:40.156388image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-19T14:47:40.649173image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-19T14:47:40.931456image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
02021-02-15 11:57:08.316PK-Seutu31-35 vNaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN
12021-02-15 11:57:19.676Turku31-35 vmies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN
22021-02-15 11:58:03.592PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN
32021-02-15 11:58:15.261Tampere31-35 vmies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN
42021-02-15 11:58:16.983PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN
52021-02-15 11:58:49.454PK-Seutu41-45 vmies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN8000.0100000.0TrueNaNNaN
62021-02-15 12:00:03.771PK-Seutu31-35 vmies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN
72021-02-15 12:00:04.655Tampere31-35 vNaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4250.054000.0TrueNaNNaN
82021-02-15 12:01:00.769Tampere31-35 vmies6.0Työntekijä / palkollinen1.0Lead developerNaN4000.050000.0FalseNaNNaN
92021-02-15 12:02:03.577Tallinna31-35 vmies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
4142021-02-19 15:34:53.741Tampere26-30 vmuu7.0Työntekijä / palkollinen1.0Full-stack developer50/505550.069400.0TrueNaNNaN
4152021-02-19 15:40:16.336PK-Seutu26-30 vmies5.0Työntekijä / palkollinen0.8Full-stack/mobiili/designEtä7000.090000.0TrueMavericksNaN
4162021-02-19 16:04:50.348Tampere36-40 vmies16.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4800.065000.0TrueNaNBonukset riippuu firman tuloksesta. Palkka olisi varmastikin enemmän muualla mutta uskoakseni linjassa kollegoideni kanssa.
4172021-02-19 16:17:29.891PK-Seutu36-40 vnainen8.0Työntekijä / palkollinenNaNProduct Owner50/504500.056200.0TrueNaNNaN
4182021-02-19 16:26:32.700PK-Seutu36-40 vmies16.0Työntekijä / palkollinen1.0Mobile SWEtä8000.095000.0TrueMavericksNaN
4192021-02-19 16:33:27.762PK-Seutu31-35 vmies11.0Työntekijä / palkollinen1.0Full stack50/507000.087500.0TrueMavericksNaN
4202021-02-19 16:34:07.545PK-Seutu31-35 vmies12.0Työntekijä / palkollinen1.0full-stackEtä8000.095000.0TrueMavericksNaN
4212021-02-19 16:36:55.938Tampere41-45 vmies22.0Työntekijä / palkollinen0.8ohjelmistokehittäjä (backend) / arkkitehtiEtä4700.058750.0FalseNaNNaN
4222021-02-19 16:38:41.403PK-Seutu36-40 vmies2.0Työntekijä / palkollinen1.0WordPress-kehittäjä50/503000.037500.0FalseNaNNaN
4232021-02-19 16:39:14.831Tampere31-35 vmies5.0Työntekijä / palkollinen1.0Data scientistEtä4300.053750.0NaNWapiceNaN