Overview

Dataset statistics

Number of variables14
Number of observations425
Missing cells996
Missing cells (%)16.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory36.3 KiB
Average record size in memory87.5 B

Variable types

DateTime1
Categorical9
Numeric3
Boolean1

Warnings

Rooli has a high cardinality: 226 distinct values High cardinality
Työpaikka has a high cardinality: 67 distinct values High cardinality
Vapaa sana is highly correlated with Työpaikka and 1 other fieldsHigh correlation
Työpaikka is highly correlated with Vapaa sanaHigh correlation
Kilpailukykyinen is highly correlated with Vapaa sanaHigh correlation
Sukupuoli has 32 (7.5%) missing values Missing
Työaika has 16 (3.8%) missing values Missing
Rooli has 10 (2.4%) missing values Missing
Etä has 145 (34.1%) missing values Missing
Kuukausipalkka has 38 (8.9%) missing values Missing
Vuositulot has 11 (2.6%) missing values Missing
Kilpailukykyinen has 14 (3.3%) missing values Missing
Työpaikka has 327 (76.9%) missing values Missing
Vapaa sana has 392 (92.2%) missing values Missing
Vapaa sana is uniformly distributed Uniform
Timestamp has unique values Unique

Reproduction

Analysis started2021-02-19 14:51:33.971507
Analysis finished2021-02-19 14:51:37.708125
Duration3.74 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

Timestamp
Date

UNIQUE

Distinct425
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
Minimum2021-02-15 11:57:08.316000
Maximum2021-02-19 16:48:04.696000
2021-02-19T14:51:37.803656image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:51:38.001886image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Kaupunki
Categorical

Distinct25
Distinct (%)5.9%
Missing4
Missing (%)0.9%
Memory size1.3 KiB
PK-Seutu
216 
Tampere
99 
Turku
42 
Oulu
22 
Jyväskylä
 
17
Other values (20)
25 

Length

Max length15
Median length8
Mean length7.258907363
Min length2

Characters and Unicode

Total characters3056
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)4.3%

Sample

1st rowPK-Seutu
2nd rowTurku
3rd rowPK-Seutu
4th rowTampere
5th rowPK-Seutu
ValueCountFrequency (%)
PK-Seutu216
50.8%
Tampere99
23.3%
Turku42
 
9.9%
Oulu22
 
5.2%
Jyväskylä17
 
4.0%
Kuopio5
 
1.2%
Pori2
 
0.5%
Ruotsi1
 
0.2%
Wien1
 
0.2%
Viimsi1
 
0.2%
Other values (15)15
 
3.5%
(Missing)4
 
0.9%
2021-02-19T14:51:38.396788image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pk-seutu216
50.8%
tampere99
23.3%
turku42
 
9.9%
oulu22
 
5.2%
jyväskylä17
 
4.0%
kuopio5
 
1.2%
pori2
 
0.5%
tallinna1
 
0.2%
länsi-suomi1
 
0.2%
vaasa1
 
0.2%
Other values (19)19
 
4.5%

Most occurring characters

ValueCountFrequency (%)
u572
18.7%
e423
13.8%
K224
 
7.3%
t221
 
7.2%
P219
 
7.2%
-218
 
7.1%
S218
 
7.1%
r147
 
4.8%
T142
 
4.6%
a116
 
3.8%
Other values (29)556
18.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1975
64.6%
Uppercase Letter858
28.1%
Dash Punctuation218
 
7.1%
Space Separator4
 
0.1%
Other Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
u572
29.0%
e423
21.4%
t221
 
11.2%
r147
 
7.4%
a116
 
5.9%
p105
 
5.3%
m103
 
5.2%
k60
 
3.0%
l47
 
2.4%
ä41
 
2.1%
Other values (10)140
 
7.1%
ValueCountFrequency (%)
K224
26.1%
P219
25.5%
S218
25.4%
T142
16.6%
O22
 
2.6%
J18
 
2.1%
E3
 
0.3%
L3
 
0.3%
V2
 
0.2%
W1
 
0.1%
Other values (6)6
 
0.7%
ValueCountFrequency (%)
-218
100.0%
ValueCountFrequency (%)
4
100.0%
ValueCountFrequency (%)
,1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2833
92.7%
Common223
 
7.3%

Most frequent character per script

ValueCountFrequency (%)
u572
20.2%
e423
14.9%
K224
 
7.9%
t221
 
7.8%
P219
 
7.7%
S218
 
7.7%
r147
 
5.2%
T142
 
5.0%
a116
 
4.1%
p105
 
3.7%
Other values (26)446
15.7%
ValueCountFrequency (%)
-218
97.8%
4
 
1.8%
,1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII3015
98.7%
None41
 
1.3%

Most frequent character per block

ValueCountFrequency (%)
u572
19.0%
e423
14.0%
K224
 
7.4%
t221
 
7.3%
P219
 
7.3%
-218
 
7.2%
S218
 
7.2%
r147
 
4.9%
T142
 
4.7%
a116
 
3.8%
Other values (28)515
17.1%
ValueCountFrequency (%)
ä41
100.0%

Ikä
Categorical

Distinct7
Distinct (%)1.7%
Missing2
Missing (%)0.5%
Memory size909.0 B
31-35 v
142 
26-30 v
104 
36-40 v
92 
41-45 v
47 
21-25 v
26 
Other values (2)
 
12

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters2961
Distinct characters10
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row31-35 v
2nd row31-35 v
3rd row26-30 v
4th row31-35 v
5th row26-30 v
ValueCountFrequency (%)
31-35 v142
33.4%
26-30 v104
24.5%
36-40 v92
21.6%
41-45 v47
 
11.1%
21-25 v26
 
6.1%
46-50 v7
 
1.6%
51-55 v5
 
1.2%
(Missing)2
 
0.5%
2021-02-19T14:51:38.739866image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:51:38.851144image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
v423
50.0%
31-35142
 
16.8%
26-30104
 
12.3%
36-4092
 
10.9%
41-4547
 
5.6%
21-2526
 
3.1%
46-507
 
0.8%
51-555
 
0.6%

Most occurring characters

ValueCountFrequency (%)
3480
16.2%
-423
14.3%
423
14.3%
v423
14.3%
5237
8.0%
1220
7.4%
6203
6.9%
0203
6.9%
4193
6.5%
2156
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1692
57.1%
Dash Punctuation423
 
14.3%
Space Separator423
 
14.3%
Lowercase Letter423
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
3480
28.4%
5237
14.0%
1220
13.0%
6203
12.0%
0203
12.0%
4193
11.4%
2156
 
9.2%
ValueCountFrequency (%)
-423
100.0%
ValueCountFrequency (%)
423
100.0%
ValueCountFrequency (%)
v423
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2538
85.7%
Latin423
 
14.3%

Most frequent character per script

ValueCountFrequency (%)
3480
18.9%
-423
16.7%
423
16.7%
5237
9.3%
1220
8.7%
6203
8.0%
0203
8.0%
4193
7.6%
2156
 
6.1%
ValueCountFrequency (%)
v423
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2961
100.0%

Most frequent character per block

ValueCountFrequency (%)
3480
16.2%
-423
14.3%
423
14.3%
v423
14.3%
5237
8.0%
1220
7.4%
6203
6.9%
0203
6.9%
4193
6.5%
2156
 
5.3%

Sukupuoli
Categorical

MISSING

Distinct3
Distinct (%)0.8%
Missing32
Missing (%)7.5%
Memory size685.0 B
mies
357 
nainen
 
28
muu
 
8

Length

Max length6
Median length4
Mean length4.122137405
Min length3

Characters and Unicode

Total characters1620
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmies
2nd rowmies
3rd rowmies
4th rowmies
5th rowmies
ValueCountFrequency (%)
mies357
84.0%
nainen28
 
6.6%
muu8
 
1.9%
(Missing)32
 
7.5%
2021-02-19T14:51:39.204814image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:51:39.316710image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
mies357
90.8%
nainen28
 
7.1%
muu8
 
2.0%

Most occurring characters

ValueCountFrequency (%)
i385
23.8%
e385
23.8%
m365
22.5%
s357
22.0%
n84
 
5.2%
a28
 
1.7%
u16
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1620
100.0%

Most frequent character per category

ValueCountFrequency (%)
i385
23.8%
e385
23.8%
m365
22.5%
s357
22.0%
n84
 
5.2%
a28
 
1.7%
u16
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1620
100.0%

Most frequent character per script

ValueCountFrequency (%)
i385
23.8%
e385
23.8%
m365
22.5%
s357
22.0%
n84
 
5.2%
a28
 
1.7%
u16
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1620
100.0%

Most frequent character per block

ValueCountFrequency (%)
i385
23.8%
e385
23.8%
m365
22.5%
s357
22.0%
n84
 
5.2%
a28
 
1.7%
u16
 
1.0%

Työkokemus
Real number (ℝ≥0)

Distinct27
Distinct (%)6.4%
Missing4
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean9.629453682
Minimum0
Maximum30
Zeros3
Zeros (%)0.7%
Memory size3.4 KiB
2021-02-19T14:51:39.434590image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median9
Q313
95-th percentile21
Maximum30
Range30
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.065553556
Coefficient of variation (CV)0.6298959169
Kurtosis-0.03285215062
Mean9.629453682
Median Absolute Deviation (MAD)4
Skewness0.7150307253
Sum4054
Variance36.79093994
MonotocityNot monotonic
2021-02-19T14:51:39.589511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
544
 
10.4%
1034
 
8.0%
428
 
6.6%
726
 
6.1%
2025
 
5.9%
1524
 
5.6%
324
 
5.6%
222
 
5.2%
1122
 
5.2%
1322
 
5.2%
Other values (17)150
35.3%
ValueCountFrequency (%)
03
 
0.7%
115
3.5%
222
5.2%
324
5.6%
428
6.6%
ValueCountFrequency (%)
302
 
0.5%
255
1.2%
242
 
0.5%
234
0.9%
224
0.9%
Distinct3
Distinct (%)0.7%
Missing1
Missing (%)0.2%
Memory size3.4 KiB
Työntekijä / palkollinen
379 
Freelancer
 
23
Yrittäjä
 
22

Length

Max length24
Median length24
Mean length22.41037736
Min length8

Characters and Unicode

Total characters9502
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTyöntekijä / palkollinen
2nd rowTyöntekijä / palkollinen
3rd rowTyöntekijä / palkollinen
4th rowYrittäjä
5th rowTyöntekijä / palkollinen
ValueCountFrequency (%)
Työntekijä / palkollinen379
89.2%
Freelancer23
 
5.4%
Yrittäjä22
 
5.2%
(Missing)1
 
0.2%
2021-02-19T14:51:39.940158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:51:40.054803image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
379
32.1%
palkollinen379
32.1%
työntekijä379
32.1%
freelancer23
 
1.9%
yrittäjä22
 
1.9%

Most occurring characters

ValueCountFrequency (%)
n1160
12.2%
l1160
12.2%
e827
 
8.7%
i780
 
8.2%
k758
 
8.0%
758
 
8.0%
t423
 
4.5%
ä423
 
4.5%
a402
 
4.2%
j401
 
4.2%
Other values (10)2410
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7941
83.6%
Space Separator758
 
8.0%
Uppercase Letter424
 
4.5%
Other Punctuation379
 
4.0%

Most frequent character per category

ValueCountFrequency (%)
n1160
14.6%
l1160
14.6%
e827
10.4%
i780
9.8%
k758
9.5%
t423
 
5.3%
ä423
 
5.3%
a402
 
5.1%
j401
 
5.0%
y379
 
4.8%
Other values (5)1228
15.5%
ValueCountFrequency (%)
T379
89.4%
F23
 
5.4%
Y22
 
5.2%
ValueCountFrequency (%)
758
100.0%
ValueCountFrequency (%)
/379
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8365
88.0%
Common1137
 
12.0%

Most frequent character per script

ValueCountFrequency (%)
n1160
13.9%
l1160
13.9%
e827
9.9%
i780
9.3%
k758
9.1%
t423
 
5.1%
ä423
 
5.1%
a402
 
4.8%
j401
 
4.8%
T379
 
4.5%
Other values (8)1652
19.7%
ValueCountFrequency (%)
758
66.7%
/379
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8700
91.6%
None802
 
8.4%

Most frequent character per block

ValueCountFrequency (%)
n1160
13.3%
l1160
13.3%
e827
9.5%
i780
9.0%
k758
8.7%
758
8.7%
t423
 
4.9%
a402
 
4.6%
j401
 
4.6%
T379
 
4.4%
Other values (8)1652
19.0%
ValueCountFrequency (%)
ä423
52.7%
ö379
47.3%

Työaika
Categorical

MISSING

Distinct5
Distinct (%)1.2%
Missing16
Missing (%)3.8%
Memory size3.4 KiB
1.0
385 
0.8
 
20
0.5
 
2
0.7
 
1
0.6
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1227
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.5%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
1.0385
90.6%
0.820
 
4.7%
0.52
 
0.5%
0.71
 
0.2%
0.61
 
0.2%
(Missing)16
 
3.8%
2021-02-19T14:51:40.496535image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:51:40.601316image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.0385
94.1%
0.820
 
4.9%
0.52
 
0.5%
0.71
 
0.2%
0.61
 
0.2%

Most occurring characters

ValueCountFrequency (%)
.409
33.3%
0409
33.3%
1385
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number818
66.7%
Other Punctuation409
33.3%

Most frequent character per category

ValueCountFrequency (%)
0409
50.0%
1385
47.1%
820
 
2.4%
52
 
0.2%
71
 
0.1%
61
 
0.1%
ValueCountFrequency (%)
.409
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1227
100.0%

Most frequent character per script

ValueCountFrequency (%)
.409
33.3%
0409
33.3%
1385
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1227
100.0%

Most frequent character per block

ValueCountFrequency (%)
.409
33.3%
0409
33.3%
1385
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Rooli
Categorical

HIGH CARDINALITY
MISSING

Distinct226
Distinct (%)54.5%
Missing10
Missing (%)2.4%
Memory size3.4 KiB
Ohjelmistokehittäjä
33 
full-stack
 
29
Full-stack
 
21
ohjelmistokehittäjä
 
15
Arkkitehti
 
15
Other values (221)
302 

Length

Max length67
Median length18
Mean length19.17831325
Min length2

Characters and Unicode

Total characters7959
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique182 ?
Unique (%)43.9%

Sample

1st rowArkkitehti
2nd rowfull-stack
3rd rowFull-stack ohjelmistokehittäjä
4th rowweb-arkkitehti
5th rowOhjelmistokehittäjä
ValueCountFrequency (%)
Ohjelmistokehittäjä33
 
7.8%
full-stack29
 
6.8%
Full-stack21
 
4.9%
ohjelmistokehittäjä15
 
3.5%
Arkkitehti15
 
3.5%
Full-stack ohjelmistokehittäjä8
 
1.9%
arkkitehti6
 
1.4%
full-stack ohjelmistokehittäjä6
 
1.4%
Frontend5
 
1.2%
DevOps5
 
1.2%
Other values (216)272
64.0%
(Missing)10
 
2.4%
2021-02-19T14:51:41.023371image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
full-stack126
 
16.5%
ohjelmistokehittäjä99
 
13.0%
developer52
 
6.8%
arkkitehti34
 
4.5%
30
 
3.9%
lead27
 
3.5%
frontend23
 
3.0%
senior18
 
2.4%
kehittäjä15
 
2.0%
backend14
 
1.8%
Other values (165)325
42.6%

Most occurring characters

ValueCountFrequency (%)
t834
 
10.5%
e728
 
9.1%
i587
 
7.4%
l582
 
7.3%
k449
 
5.6%
o415
 
5.2%
s381
 
4.8%
a379
 
4.8%
352
 
4.4%
h320
 
4.0%
Other values (47)2932
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6920
86.9%
Uppercase Letter392
 
4.9%
Space Separator353
 
4.4%
Dash Punctuation152
 
1.9%
Other Punctuation86
 
1.1%
Open Punctuation24
 
0.3%
Close Punctuation24
 
0.3%
Math Symbol8
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
t834
12.1%
e728
 
10.5%
i587
 
8.5%
l582
 
8.4%
k449
 
6.5%
o415
 
6.0%
s381
 
5.5%
a379
 
5.5%
h320
 
4.6%
j299
 
4.3%
Other values (16)1946
28.1%
ValueCountFrequency (%)
F89
22.7%
O80
20.4%
S44
11.2%
D39
9.9%
A24
 
6.1%
T20
 
5.1%
L16
 
4.1%
C13
 
3.3%
P10
 
2.6%
E10
 
2.6%
Other values (11)47
12.0%
ValueCountFrequency (%)
,50
58.1%
/32
37.2%
&3
 
3.5%
.1
 
1.2%
ValueCountFrequency (%)
352
99.7%
 1
 
0.3%
ValueCountFrequency (%)
-152
100.0%
ValueCountFrequency (%)
(24
100.0%
ValueCountFrequency (%)
)24
100.0%
ValueCountFrequency (%)
+8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7312
91.9%
Common647
 
8.1%

Most frequent character per script

ValueCountFrequency (%)
t834
 
11.4%
e728
 
10.0%
i587
 
8.0%
l582
 
8.0%
k449
 
6.1%
o415
 
5.7%
s381
 
5.2%
a379
 
5.2%
h320
 
4.4%
j299
 
4.1%
Other values (37)2338
32.0%
ValueCountFrequency (%)
352
54.4%
-152
23.5%
,50
 
7.7%
/32
 
4.9%
(24
 
3.7%
)24
 
3.7%
+8
 
1.2%
&3
 
0.5%
.1
 
0.2%
 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7657
96.2%
None302
 
3.8%

Most frequent character per block

ValueCountFrequency (%)
t834
 
10.9%
e728
 
9.5%
i587
 
7.7%
l582
 
7.6%
k449
 
5.9%
o415
 
5.4%
s381
 
5.0%
a379
 
4.9%
352
 
4.6%
h320
 
4.2%
Other values (44)2630
34.3%
ValueCountFrequency (%)
ä286
94.7%
ö15
 
5.0%
 1
 
0.3%

Etä
Categorical

MISSING

Distinct2
Distinct (%)0.7%
Missing145
Missing (%)34.1%
Memory size677.0 B
Etä
178 
50/50
102 

Length

Max length5
Median length3
Mean length3.728571429
Min length3

Characters and Unicode

Total characters1044
Distinct characters6
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50/50
2nd rowEtä
3rd rowEtä
4th rowEtä
5th rowEtä
ValueCountFrequency (%)
Etä178
41.9%
50/50102
24.0%
(Missing)145
34.1%
2021-02-19T14:51:41.402240image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:51:41.531548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
etä178
63.6%
50/50102
36.4%

Most occurring characters

ValueCountFrequency (%)
5204
19.5%
0204
19.5%
E178
17.0%
t178
17.0%
ä178
17.0%
/102
9.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number408
39.1%
Lowercase Letter356
34.1%
Uppercase Letter178
17.0%
Other Punctuation102
 
9.8%

Most frequent character per category

ValueCountFrequency (%)
5204
50.0%
0204
50.0%
ValueCountFrequency (%)
t178
50.0%
ä178
50.0%
ValueCountFrequency (%)
/102
100.0%
ValueCountFrequency (%)
E178
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin534
51.1%
Common510
48.9%

Most frequent character per script

ValueCountFrequency (%)
5204
40.0%
0204
40.0%
/102
20.0%
ValueCountFrequency (%)
E178
33.3%
t178
33.3%
ä178
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII866
83.0%
None178
 
17.0%

Most frequent character per block

ValueCountFrequency (%)
5204
23.6%
0204
23.6%
E178
20.6%
t178
20.6%
/102
11.8%
ValueCountFrequency (%)
ä178
100.0%

Kuukausipalkka
Real number (ℝ≥0)

MISSING

Distinct117
Distinct (%)30.2%
Missing38
Missing (%)8.9%
Infinite0
Infinite (%)0.0%
Mean4672.372093
Minimum1666
Maximum15000
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2021-02-19T14:51:41.654521image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1666
5-th percentile2806
Q13825
median4500
Q35500
95-th percentile7000
Maximum15000
Range13334
Interquartile range (IQR)1675

Descriptive statistics

Standard deviation1322.304375
Coefficient of variation (CV)0.2830049382
Kurtosis8.972606581
Mean4672.372093
Median Absolute Deviation (MAD)750
Skewness1.49001451
Sum1808208
Variance1748488.861
MonotocityNot monotonic
2021-02-19T14:51:41.857154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
400022
 
5.2%
450020
 
4.7%
600015
 
3.5%
500015
 
3.5%
550014
 
3.3%
480011
 
2.6%
700011
 
2.6%
430011
 
2.6%
420010
 
2.4%
41009
 
2.1%
Other values (107)249
58.6%
(Missing)38
 
8.9%
ValueCountFrequency (%)
16661
0.2%
17001
0.2%
18001
0.2%
21001
0.2%
22751
0.2%
ValueCountFrequency (%)
150001
 
0.2%
85001
 
0.2%
80005
1.2%
75002
 
0.5%
72001
 
0.2%

Vuositulot
Real number (ℝ≥0)

MISSING

Distinct168
Distinct (%)40.6%
Missing11
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean65291.82729
Minimum0
Maximum250000
Zeros2
Zeros (%)0.5%
Memory size3.4 KiB
2021-02-19T14:51:42.053133image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile35000
Q150000
median60000
Q375000
95-th percentile120000
Maximum250000
Range250000
Interquartile range (IQR)25000

Descriptive statistics

Standard deviation28717.38171
Coefficient of variation (CV)0.4398311842
Kurtosis8.364561254
Mean65291.82729
Median Absolute Deviation (MAD)12000
Skewness2.168117007
Sum27030816.5
Variance824688012.5
MonotocityNot monotonic
2021-02-19T14:51:42.257574image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5500016
 
3.8%
6000014
 
3.3%
7500014
 
3.3%
5000014
 
3.3%
6500010
 
2.4%
625009
 
2.1%
850009
 
2.1%
800009
 
2.1%
520008
 
1.9%
400008
 
1.9%
Other values (158)303
71.3%
(Missing)11
 
2.6%
ValueCountFrequency (%)
02
0.5%
40001
0.2%
61001
0.2%
75001
0.2%
200001
0.2%
ValueCountFrequency (%)
2500001
 
0.2%
2000003
0.7%
1900001
 
0.2%
1800001
 
0.2%
1550001
 
0.2%

Kilpailukykyinen
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.5%
Missing14
Missing (%)3.3%
Memory size3.4 KiB
True
288 
False
123 
(Missing)
 
14
ValueCountFrequency (%)
True288
67.8%
False123
28.9%
(Missing)14
 
3.3%
2021-02-19T14:51:42.387295image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Työpaikka
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct67
Distinct (%)68.4%
Missing327
Missing (%)76.9%
Memory size3.4 KiB
Gofore
11 
Vincit
 
6
Fraktio
 
4
Futurice
 
4
Mavericks
 
3
Other values (62)
70 

Length

Max length132
Median length7
Mean length10.59183673
Min length2

Characters and Unicode

Total characters1038
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique56 ?
Unique (%)57.1%

Sample

1st rowQuestrade
2nd rowDigia Oyj
3rd rowGofore
4th rowOura Health
5th rowWirepas
ValueCountFrequency (%)
Gofore11
 
2.6%
Vincit6
 
1.4%
Fraktio4
 
0.9%
Futurice4
 
0.9%
Mavericks3
 
0.7%
Pankki3
 
0.7%
Arado3
 
0.7%
KVTES-alainen kunnan omistama oy 2
 
0.5%
Qvik2
 
0.5%
Gofore Oyj2
 
0.5%
Other values (57)58
 
13.6%
(Missing)327
76.9%
2021-02-19T14:51:42.851393image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gofore13
 
8.4%
oy9
 
5.8%
vincit6
 
3.9%
mavericks4
 
2.6%
fraktio4
 
2.6%
oyj4
 
2.6%
futurice4
 
2.6%
arado3
 
1.9%
pankki3
 
1.9%
siili3
 
1.9%
Other values (89)102
65.8%

Most occurring characters

ValueCountFrequency (%)
i104
 
10.0%
a82
 
7.9%
o79
 
7.6%
e73
 
7.0%
t71
 
6.8%
60
 
5.8%
r57
 
5.5%
n50
 
4.8%
l41
 
3.9%
u40
 
3.9%
Other values (43)381
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter842
81.1%
Uppercase Letter130
 
12.5%
Space Separator60
 
5.8%
Other Punctuation3
 
0.3%
Dash Punctuation3
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
G14
 
10.8%
O13
 
10.0%
V12
 
9.2%
S11
 
8.5%
F9
 
6.9%
A7
 
5.4%
K7
 
5.4%
C6
 
4.6%
P6
 
4.6%
E5
 
3.8%
Other values (15)40
30.8%
ValueCountFrequency (%)
i104
12.4%
a82
 
9.7%
o79
 
9.4%
e73
 
8.7%
t71
 
8.4%
r57
 
6.8%
n50
 
5.9%
l41
 
4.9%
u40
 
4.8%
k40
 
4.8%
Other values (15)205
24.3%
ValueCountFrequency (%)
60
100.0%
ValueCountFrequency (%)
.3
100.0%
ValueCountFrequency (%)
-3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin972
93.6%
Common66
 
6.4%

Most frequent character per script

ValueCountFrequency (%)
i104
 
10.7%
a82
 
8.4%
o79
 
8.1%
e73
 
7.5%
t71
 
7.3%
r57
 
5.9%
n50
 
5.1%
l41
 
4.2%
u40
 
4.1%
k40
 
4.1%
Other values (40)335
34.5%
ValueCountFrequency (%)
60
90.9%
.3
 
4.5%
-3
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1026
98.8%
None12
 
1.2%

Most frequent character per block

ValueCountFrequency (%)
i104
 
10.1%
a82
 
8.0%
o79
 
7.7%
e73
 
7.1%
t71
 
6.9%
60
 
5.8%
r57
 
5.6%
n50
 
4.9%
l41
 
4.0%
u40
 
3.9%
Other values (41)369
36.0%
ValueCountFrequency (%)
ä11
91.7%
ö1
 
8.3%

Vapaa sana
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct32
Distinct (%)97.0%
Missing392
Missing (%)92.2%
Memory size3.4 KiB
palkan lisänä lounas- ja virkistysetu
 
2
Osittain laskutukseen perustuva palkka joten vaihtelee.
 
1
Ilmaset kaffet, safkat, salit jne.
 
1
Halpaa freelancer laskutusta oman tuotekehityksen sivussa
 
1
Opiskelija
 
1
Other values (27)
27 

Length

Max length286
Median length71
Mean length95.54545455
Min length7

Characters and Unicode

Total characters3153
Distinct characters55
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)93.9%

Sample

1st rowKuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.
2nd rowTyöskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
3rd rowpalkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen.
4th rowRahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
5th rowOsittain laskutukseen perustuva palkka joten vaihtelee.
ValueCountFrequency (%)
palkan lisänä lounas- ja virkistysetu2
 
0.5%
Osittain laskutukseen perustuva palkka joten vaihtelee.1
 
0.2%
Ilmaset kaffet, safkat, salit jne.1
 
0.2%
Halpaa freelancer laskutusta oman tuotekehityksen sivussa1
 
0.2%
Opiskelija1
 
0.2%
Kuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.1
 
0.2%
+ merkittävä optiopaketti1
 
0.2%
saispa lisää liksaa1
 
0.2%
Sijainti Pori, mutta etätöitä 100%. Varsinainen positio Tampere - Helsinki. Edut aika huonot, perusjutut. Työ itsessään aika masentavaa. Seuraavaksi varmaan freelance/yrittäjyys.1
 
0.2%
Työskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.1
 
0.2%
Other values (22)22
 
5.2%
(Missing)392
92.2%
2021-02-19T14:51:43.265941image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ei10
 
2.5%
palkka9
 
2.3%
on8
 
2.0%
ja6
 
1.5%
ole6
 
1.5%
mutta6
 
1.5%
ihan4
 
1.0%
joten4
 
1.0%
palkan4
 
1.0%
nyt4
 
1.0%
Other values (281)334
84.6%

Most occurring characters

ValueCountFrequency (%)
365
11.6%
a331
10.5%
i271
 
8.6%
t247
 
7.8%
n216
 
6.9%
s205
 
6.5%
e203
 
6.4%
k185
 
5.9%
l161
 
5.1%
o146
 
4.6%
Other values (45)823
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2625
83.3%
Space Separator365
 
11.6%
Other Punctuation74
 
2.3%
Uppercase Letter47
 
1.5%
Decimal Number27
 
0.9%
Dash Punctuation6
 
0.2%
Open Punctuation3
 
0.1%
Close Punctuation3
 
0.1%
Math Symbol3
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
a331
12.6%
i271
10.3%
t247
9.4%
n216
 
8.2%
s205
 
7.8%
e203
 
7.7%
k185
 
7.0%
l161
 
6.1%
o146
 
5.6%
u118
 
4.5%
Other values (14)542
20.6%
ValueCountFrequency (%)
T7
14.9%
P7
14.9%
O6
12.8%
E6
12.8%
V6
12.8%
S4
8.5%
K3
6.4%
I2
 
4.3%
H2
 
4.3%
R1
 
2.1%
Other values (3)3
6.4%
ValueCountFrequency (%)
015
55.6%
13
 
11.1%
52
 
7.4%
22
 
7.4%
82
 
7.4%
62
 
7.4%
31
 
3.7%
ValueCountFrequency (%)
.38
51.4%
,23
31.1%
/5
 
6.8%
%4
 
5.4%
"2
 
2.7%
?2
 
2.7%
ValueCountFrequency (%)
365
100.0%
ValueCountFrequency (%)
(3
100.0%
ValueCountFrequency (%)
)3
100.0%
ValueCountFrequency (%)
+3
100.0%
ValueCountFrequency (%)
-6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2672
84.7%
Common481
 
15.3%

Most frequent character per script

ValueCountFrequency (%)
a331
12.4%
i271
10.1%
t247
9.2%
n216
 
8.1%
s205
 
7.7%
e203
 
7.6%
k185
 
6.9%
l161
 
6.0%
o146
 
5.5%
u118
 
4.4%
Other values (27)589
22.0%
ValueCountFrequency (%)
365
75.9%
.38
 
7.9%
,23
 
4.8%
015
 
3.1%
-6
 
1.2%
/5
 
1.0%
%4
 
0.8%
(3
 
0.6%
)3
 
0.6%
+3
 
0.6%
Other values (8)16
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3017
95.7%
None136
 
4.3%

Most frequent character per block

ValueCountFrequency (%)
365
12.1%
a331
11.0%
i271
 
9.0%
t247
 
8.2%
n216
 
7.2%
s205
 
6.8%
e203
 
6.7%
k185
 
6.1%
l161
 
5.3%
o146
 
4.8%
Other values (43)687
22.8%
ValueCountFrequency (%)
ä112
82.4%
ö24
 
17.6%

Interactions

2021-02-19T14:51:35.198511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:51:35.363569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:51:35.520600image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:51:35.683280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:51:35.842082image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:51:35.997486image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-02-19T14:51:43.422286image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-19T14:51:43.619401image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-19T14:51:43.816061image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-19T14:51:44.022189image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-19T14:51:36.312343image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-19T14:51:36.713803image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-19T14:51:37.204624image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-19T14:51:37.528472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
02021-02-15 11:57:08.316PK-Seutu31-35 vNaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN
12021-02-15 11:57:19.676Turku31-35 vmies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN
22021-02-15 11:58:03.592PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN
32021-02-15 11:58:15.261Tampere31-35 vmies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN
42021-02-15 11:58:16.983PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN
52021-02-15 11:58:49.454PK-Seutu41-45 vmies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN8000.0100000.0TrueNaNNaN
62021-02-15 12:00:03.771PK-Seutu31-35 vmies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN
72021-02-15 12:00:04.655Tampere31-35 vNaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4250.054000.0TrueNaNNaN
82021-02-15 12:01:00.769Tampere31-35 vmies6.0Työntekijä / palkollinen1.0Lead developerNaN4000.050000.0FalseNaNNaN
92021-02-15 12:02:03.577Tallinna31-35 vmies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
4152021-02-19 15:40:16.336PK-Seutu26-30 vmies5.0Työntekijä / palkollinen0.8Full-stack/mobiili/designEtä7000.090000.0TrueMavericksNaN
4162021-02-19 16:04:50.348Tampere36-40 vmies16.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4800.065000.0TrueNaNBonukset riippuu firman tuloksesta. Palkka olisi varmastikin enemmän muualla mutta uskoakseni linjassa kollegoideni kanssa.
4172021-02-19 16:17:29.891PK-Seutu36-40 vnainen8.0Työntekijä / palkollinenNaNProduct Owner50/504500.056200.0TrueNaNNaN
4182021-02-19 16:26:32.700PK-Seutu36-40 vmies16.0Työntekijä / palkollinen1.0Mobile SWEtä8000.095000.0TrueMavericksNaN
4192021-02-19 16:33:27.762PK-Seutu31-35 vmies11.0Työntekijä / palkollinen1.0Full stack50/507000.087500.0TrueMavericksNaN
4202021-02-19 16:34:07.545PK-Seutu31-35 vmies12.0Työntekijä / palkollinen1.0full-stackEtä8000.095000.0TrueMavericksNaN
4212021-02-19 16:36:55.938Tampere41-45 vmies22.0Työntekijä / palkollinen0.8ohjelmistokehittäjä (backend) / arkkitehtiEtä4700.058750.0FalseNaNNaN
4222021-02-19 16:38:41.403PK-Seutu36-40 vmies2.0Työntekijä / palkollinen1.0WordPress-kehittäjä50/503000.037500.0FalseNaNNaN
4232021-02-19 16:39:14.831Tampere31-35 vmies5.0Työntekijä / palkollinen1.0Data scientistEtä4300.053750.0NaNWapiceNaN
4242021-02-19 16:48:04.696PK-Seutu41-45 vmies15.0Työntekijä / palkollinen1.0ohjelmistokehittäjä50/50NaN100000.0TrueNaNNaN