Overview

Dataset statistics

Number of variables14
Number of observations419
Missing cells985
Missing cells (%)16.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory35.8 KiB
Average record size in memory87.6 B

Variable types

DateTime1
Categorical9
Numeric3
Boolean1

Warnings

Rooli has a high cardinality: 224 distinct values High cardinality
Työpaikka has a high cardinality: 66 distinct values High cardinality
Työpaikka is highly correlated with Vapaa sanaHigh correlation
Kilpailukykyinen is highly correlated with Vapaa sanaHigh correlation
Vapaa sana is highly correlated with Työpaikka and 1 other fieldsHigh correlation
Sukupuoli has 32 (7.6%) missing values Missing
Työaika has 16 (3.8%) missing values Missing
Rooli has 10 (2.4%) missing values Missing
Etä has 145 (34.6%) missing values Missing
Kuukausipalkka has 37 (8.8%) missing values Missing
Vuositulot has 11 (2.6%) missing values Missing
Kilpailukykyinen has 13 (3.1%) missing values Missing
Työpaikka has 324 (77.3%) missing values Missing
Vapaa sana has 386 (92.1%) missing values Missing
Vapaa sana is uniformly distributed Uniform
Timestamp has unique values Unique

Reproduction

Analysis started2021-02-19 14:29:51.058853
Analysis finished2021-02-19 14:29:54.947410
Duration3.89 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

Timestamp
Date

UNIQUE

Distinct419
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
Minimum2021-02-15 11:57:08.316000
Maximum2021-02-19 16:26:32.700000
2021-02-19T14:29:55.042030image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:29:55.226687image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Kaupunki
Categorical

Distinct25
Distinct (%)6.0%
Missing4
Missing (%)1.0%
Memory size1.3 KiB
PK-Seutu
212 
Tampere
97 
Turku
42 
Oulu
22 
Jyväskylä
 
17
Other values (20)
25 

Length

Max length15
Median length8
Mean length7.253012048
Min length2

Characters and Unicode

Total characters3010
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)4.3%

Sample

1st rowPK-Seutu
2nd rowTurku
3rd rowPK-Seutu
4th rowTampere
5th rowPK-Seutu
ValueCountFrequency (%)
PK-Seutu212
50.6%
Tampere97
23.2%
Turku42
 
10.0%
Oulu22
 
5.3%
Jyväskylä17
 
4.1%
Kuopio5
 
1.2%
Pori2
 
0.5%
Ruotsi1
 
0.2%
Wien1
 
0.2%
Viimsi1
 
0.2%
Other values (15)15
 
3.6%
(Missing)4
 
1.0%
2021-02-19T14:29:55.616231image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pk-seutu212
50.6%
tampere97
23.2%
turku42
 
10.0%
oulu22
 
5.3%
jyväskylä17
 
4.1%
kuopio5
 
1.2%
pori2
 
0.5%
länsi-suomi1
 
0.2%
vaasa1
 
0.2%
york1
 
0.2%
Other values (19)19
 
4.5%

Most occurring characters

ValueCountFrequency (%)
u564
18.7%
e415
13.8%
K220
 
7.3%
t217
 
7.2%
P215
 
7.1%
-214
 
7.1%
S214
 
7.1%
r145
 
4.8%
T140
 
4.7%
a114
 
3.8%
Other values (29)552
18.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1947
64.7%
Uppercase Letter844
28.0%
Dash Punctuation214
 
7.1%
Space Separator4
 
0.1%
Other Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
u564
29.0%
e415
21.3%
t217
 
11.1%
r145
 
7.4%
a114
 
5.9%
p103
 
5.3%
m101
 
5.2%
k60
 
3.1%
l47
 
2.4%
ä41
 
2.1%
Other values (10)140
 
7.2%
ValueCountFrequency (%)
K220
26.1%
P215
25.5%
S214
25.4%
T140
16.6%
O22
 
2.6%
J18
 
2.1%
E3
 
0.4%
L3
 
0.4%
V2
 
0.2%
W1
 
0.1%
Other values (6)6
 
0.7%
ValueCountFrequency (%)
-214
100.0%
ValueCountFrequency (%)
4
100.0%
ValueCountFrequency (%)
,1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2791
92.7%
Common219
 
7.3%

Most frequent character per script

ValueCountFrequency (%)
u564
20.2%
e415
14.9%
K220
 
7.9%
t217
 
7.8%
P215
 
7.7%
S214
 
7.7%
r145
 
5.2%
T140
 
5.0%
a114
 
4.1%
p103
 
3.7%
Other values (26)444
15.9%
ValueCountFrequency (%)
-214
97.7%
4
 
1.8%
,1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2969
98.6%
None41
 
1.4%

Most frequent character per block

ValueCountFrequency (%)
u564
19.0%
e415
14.0%
K220
 
7.4%
t217
 
7.3%
P215
 
7.2%
-214
 
7.2%
S214
 
7.2%
r145
 
4.9%
T140
 
4.7%
a114
 
3.8%
Other values (28)511
17.2%
ValueCountFrequency (%)
ä41
100.0%

Ikä
Categorical

Distinct7
Distinct (%)1.7%
Missing2
Missing (%)0.5%
Memory size903.0 B
31-35 v
139 
26-30 v
104 
36-40 v
91 
41-45 v
45 
21-25 v
26 
Other values (2)
 
12

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters2919
Distinct characters10
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row31-35 v
2nd row31-35 v
3rd row26-30 v
4th row31-35 v
5th row26-30 v
ValueCountFrequency (%)
31-35 v139
33.2%
26-30 v104
24.8%
36-40 v91
21.7%
41-45 v45
 
10.7%
21-25 v26
 
6.2%
46-50 v7
 
1.7%
51-55 v5
 
1.2%
(Missing)2
 
0.5%
2021-02-19T14:29:55.955019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:29:56.058775image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
v417
50.0%
31-35139
 
16.7%
26-30104
 
12.5%
36-4091
 
10.9%
41-4545
 
5.4%
21-2526
 
3.1%
46-507
 
0.8%
51-555
 
0.6%

Most occurring characters

ValueCountFrequency (%)
3473
16.2%
-417
14.3%
417
14.3%
v417
14.3%
5232
7.9%
1215
7.4%
6202
6.9%
0202
6.9%
4188
 
6.4%
2156
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1668
57.1%
Dash Punctuation417
 
14.3%
Space Separator417
 
14.3%
Lowercase Letter417
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
3473
28.4%
5232
13.9%
1215
12.9%
6202
12.1%
0202
12.1%
4188
 
11.3%
2156
 
9.4%
ValueCountFrequency (%)
-417
100.0%
ValueCountFrequency (%)
417
100.0%
ValueCountFrequency (%)
v417
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2502
85.7%
Latin417
 
14.3%

Most frequent character per script

ValueCountFrequency (%)
3473
18.9%
-417
16.7%
417
16.7%
5232
9.3%
1215
8.6%
6202
8.1%
0202
8.1%
4188
 
7.5%
2156
 
6.2%
ValueCountFrequency (%)
v417
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2919
100.0%

Most frequent character per block

ValueCountFrequency (%)
3473
16.2%
-417
14.3%
417
14.3%
v417
14.3%
5232
7.9%
1215
7.4%
6202
6.9%
0202
6.9%
4188
 
6.4%
2156
 
5.3%

Sukupuoli
Categorical

MISSING

Distinct3
Distinct (%)0.8%
Missing32
Missing (%)7.6%
Memory size679.0 B
mies
351 
nainen
 
28
muu
 
8

Length

Max length6
Median length4
Mean length4.124031008
Min length3

Characters and Unicode

Total characters1596
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmies
2nd rowmies
3rd rowmies
4th rowmies
5th rowmies
ValueCountFrequency (%)
mies351
83.8%
nainen28
 
6.7%
muu8
 
1.9%
(Missing)32
 
7.6%
2021-02-19T14:29:56.383861image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:29:56.487874image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
mies351
90.7%
nainen28
 
7.2%
muu8
 
2.1%

Most occurring characters

ValueCountFrequency (%)
i379
23.7%
e379
23.7%
m359
22.5%
s351
22.0%
n84
 
5.3%
a28
 
1.8%
u16
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1596
100.0%

Most frequent character per category

ValueCountFrequency (%)
i379
23.7%
e379
23.7%
m359
22.5%
s351
22.0%
n84
 
5.3%
a28
 
1.8%
u16
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1596
100.0%

Most frequent character per script

ValueCountFrequency (%)
i379
23.7%
e379
23.7%
m359
22.5%
s351
22.0%
n84
 
5.3%
a28
 
1.8%
u16
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1596
100.0%

Most frequent character per block

ValueCountFrequency (%)
i379
23.7%
e379
23.7%
m359
22.5%
s351
22.0%
n84
 
5.3%
a28
 
1.8%
u16
 
1.0%

Työkokemus
Real number (ℝ≥0)

Distinct27
Distinct (%)6.5%
Missing4
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean9.607228916
Minimum0
Maximum30
Zeros3
Zeros (%)0.7%
Memory size3.4 KiB
2021-02-19T14:29:56.596895image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median9
Q313
95-th percentile21
Maximum30
Range30
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.055894704
Coefficient of variation (CV)0.6303477055
Kurtosis-0.01045195203
Mean9.607228916
Median Absolute Deviation (MAD)4
Skewness0.7233660043
Sum3987
Variance36.67386066
MonotocityNot monotonic
2021-02-19T14:29:56.738924image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
543
 
10.3%
1034
 
8.1%
428
 
6.7%
726
 
6.2%
2025
 
6.0%
324
 
5.7%
1523
 
5.5%
1322
 
5.3%
821
 
5.0%
621
 
5.0%
Other values (17)148
35.3%
ValueCountFrequency (%)
03
 
0.7%
115
3.6%
221
5.0%
324
5.7%
428
6.7%
ValueCountFrequency (%)
302
 
0.5%
255
1.2%
242
 
0.5%
234
1.0%
223
0.7%
Distinct3
Distinct (%)0.7%
Missing1
Missing (%)0.2%
Memory size3.4 KiB
Työntekijä / palkollinen
373 
Freelancer
 
23
Yrittäjä
 
22

Length

Max length24
Median length24
Mean length22.38755981
Min length8

Characters and Unicode

Total characters9358
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTyöntekijä / palkollinen
2nd rowTyöntekijä / palkollinen
3rd rowTyöntekijä / palkollinen
4th rowYrittäjä
5th rowTyöntekijä / palkollinen
ValueCountFrequency (%)
Työntekijä / palkollinen373
89.0%
Freelancer23
 
5.5%
Yrittäjä22
 
5.3%
(Missing)1
 
0.2%
2021-02-19T14:29:57.208955image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:29:57.331971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
373
32.0%
työntekijä373
32.0%
palkollinen373
32.0%
freelancer23
 
2.0%
yrittäjä22
 
1.9%

Most occurring characters

ValueCountFrequency (%)
n1142
12.2%
l1142
12.2%
e815
 
8.7%
i768
 
8.2%
k746
 
8.0%
746
 
8.0%
t417
 
4.5%
ä417
 
4.5%
a396
 
4.2%
j395
 
4.2%
Other values (10)2374
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7821
83.6%
Space Separator746
 
8.0%
Uppercase Letter418
 
4.5%
Other Punctuation373
 
4.0%

Most frequent character per category

ValueCountFrequency (%)
n1142
14.6%
l1142
14.6%
e815
10.4%
i768
9.8%
k746
9.5%
t417
 
5.3%
ä417
 
5.3%
a396
 
5.1%
j395
 
5.1%
y373
 
4.8%
Other values (5)1210
15.5%
ValueCountFrequency (%)
T373
89.2%
F23
 
5.5%
Y22
 
5.3%
ValueCountFrequency (%)
746
100.0%
ValueCountFrequency (%)
/373
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8239
88.0%
Common1119
 
12.0%

Most frequent character per script

ValueCountFrequency (%)
n1142
13.9%
l1142
13.9%
e815
9.9%
i768
9.3%
k746
9.1%
t417
 
5.1%
ä417
 
5.1%
a396
 
4.8%
j395
 
4.8%
T373
 
4.5%
Other values (8)1628
19.8%
ValueCountFrequency (%)
746
66.7%
/373
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8568
91.6%
None790
 
8.4%

Most frequent character per block

ValueCountFrequency (%)
n1142
13.3%
l1142
13.3%
e815
9.5%
i768
9.0%
k746
8.7%
746
8.7%
t417
 
4.9%
a396
 
4.6%
j395
 
4.6%
T373
 
4.4%
Other values (8)1628
19.0%
ValueCountFrequency (%)
ä417
52.8%
ö373
47.2%

Työaika
Categorical

MISSING

Distinct5
Distinct (%)1.2%
Missing16
Missing (%)3.8%
Memory size3.4 KiB
1.0
380 
0.8
 
19
0.5
 
2
0.6
 
1
0.7
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1209
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.5%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
1.0380
90.7%
0.819
 
4.5%
0.52
 
0.5%
0.61
 
0.2%
0.71
 
0.2%
(Missing)16
 
3.8%
2021-02-19T14:29:57.643254image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:29:57.752213image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.0380
94.3%
0.819
 
4.7%
0.52
 
0.5%
0.61
 
0.2%
0.71
 
0.2%

Most occurring characters

ValueCountFrequency (%)
.403
33.3%
0403
33.3%
1380
31.4%
819
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number806
66.7%
Other Punctuation403
33.3%

Most frequent character per category

ValueCountFrequency (%)
0403
50.0%
1380
47.1%
819
 
2.4%
52
 
0.2%
71
 
0.1%
61
 
0.1%
ValueCountFrequency (%)
.403
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1209
100.0%

Most frequent character per script

ValueCountFrequency (%)
.403
33.3%
0403
33.3%
1380
31.4%
819
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1209
100.0%

Most frequent character per block

ValueCountFrequency (%)
.403
33.3%
0403
33.3%
1380
31.4%
819
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Rooli
Categorical

HIGH CARDINALITY
MISSING

Distinct224
Distinct (%)54.8%
Missing10
Missing (%)2.4%
Memory size3.4 KiB
Ohjelmistokehittäjä
33 
full-stack
 
28
Full-stack
 
21
Arkkitehti
 
15
ohjelmistokehittäjä
 
14
Other values (219)
298 

Length

Max length67
Median length18
Mean length19.1809291
Min length2

Characters and Unicode

Total characters7845
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique181 ?
Unique (%)44.3%

Sample

1st rowArkkitehti
2nd rowfull-stack
3rd rowFull-stack ohjelmistokehittäjä
4th rowweb-arkkitehti
5th rowOhjelmistokehittäjä
ValueCountFrequency (%)
Ohjelmistokehittäjä33
 
7.9%
full-stack28
 
6.7%
Full-stack21
 
5.0%
Arkkitehti15
 
3.6%
ohjelmistokehittäjä14
 
3.3%
Full-stack ohjelmistokehittäjä8
 
1.9%
full-stack ohjelmistokehittäjä6
 
1.4%
arkkitehti6
 
1.4%
Frontend5
 
1.2%
Full-stack kehittäjä5
 
1.2%
Other values (214)268
64.0%
(Missing)10
 
2.4%
2021-02-19T14:29:58.180596image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
full-stack125
 
16.6%
ohjelmistokehittäjä97
 
12.9%
developer52
 
6.9%
arkkitehti33
 
4.4%
29
 
3.9%
lead27
 
3.6%
frontend23
 
3.1%
senior18
 
2.4%
kehittäjä15
 
2.0%
backend13
 
1.7%
Other values (164)320
42.6%

Most occurring characters

ValueCountFrequency (%)
t819
 
10.4%
e719
 
9.2%
i578
 
7.4%
l576
 
7.3%
k441
 
5.6%
o410
 
5.2%
s373
 
4.8%
a373
 
4.8%
347
 
4.4%
h314
 
4.0%
Other values (47)2895
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6820
86.9%
Uppercase Letter388
 
4.9%
Space Separator348
 
4.4%
Dash Punctuation150
 
1.9%
Other Punctuation85
 
1.1%
Open Punctuation23
 
0.3%
Close Punctuation23
 
0.3%
Math Symbol8
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
t819
12.0%
e719
 
10.5%
i578
 
8.5%
l576
 
8.4%
k441
 
6.5%
o410
 
6.0%
s373
 
5.5%
a373
 
5.5%
h314
 
4.6%
j294
 
4.3%
Other values (16)1923
28.2%
ValueCountFrequency (%)
F88
22.7%
O80
20.6%
S44
11.3%
D38
9.8%
A24
 
6.2%
T20
 
5.2%
L16
 
4.1%
C13
 
3.4%
E10
 
2.6%
P9
 
2.3%
Other values (11)46
11.9%
ValueCountFrequency (%)
,50
58.8%
/31
36.5%
&3
 
3.5%
.1
 
1.2%
ValueCountFrequency (%)
347
99.7%
 1
 
0.3%
ValueCountFrequency (%)
-150
100.0%
ValueCountFrequency (%)
(23
100.0%
ValueCountFrequency (%)
)23
100.0%
ValueCountFrequency (%)
+8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7208
91.9%
Common637
 
8.1%

Most frequent character per script

ValueCountFrequency (%)
t819
 
11.4%
e719
 
10.0%
i578
 
8.0%
l576
 
8.0%
k441
 
6.1%
o410
 
5.7%
s373
 
5.2%
a373
 
5.2%
h314
 
4.4%
j294
 
4.1%
Other values (37)2311
32.1%
ValueCountFrequency (%)
347
54.5%
-150
23.5%
,50
 
7.8%
/31
 
4.9%
(23
 
3.6%
)23
 
3.6%
+8
 
1.3%
&3
 
0.5%
.1
 
0.2%
 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7549
96.2%
None296
 
3.8%

Most frequent character per block

ValueCountFrequency (%)
t819
 
10.8%
e719
 
9.5%
i578
 
7.7%
l576
 
7.6%
k441
 
5.8%
o410
 
5.4%
s373
 
4.9%
a373
 
4.9%
347
 
4.6%
h314
 
4.2%
Other values (44)2599
34.4%
ValueCountFrequency (%)
ä280
94.6%
ö15
 
5.1%
 1
 
0.3%

Etä
Categorical

MISSING

Distinct2
Distinct (%)0.7%
Missing145
Missing (%)34.6%
Memory size671.0 B
Etä
175 
50/50
99 

Length

Max length5
Median length3
Mean length3.722627737
Min length3

Characters and Unicode

Total characters1020
Distinct characters6
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50/50
2nd rowEtä
3rd rowEtä
4th rowEtä
5th rowEtä
ValueCountFrequency (%)
Etä175
41.8%
50/5099
23.6%
(Missing)145
34.6%
2021-02-19T14:29:58.561322image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:29:58.684607image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
etä175
63.9%
50/5099
36.1%

Most occurring characters

ValueCountFrequency (%)
5198
19.4%
0198
19.4%
E175
17.2%
t175
17.2%
ä175
17.2%
/99
9.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number396
38.8%
Lowercase Letter350
34.3%
Uppercase Letter175
17.2%
Other Punctuation99
 
9.7%

Most frequent character per category

ValueCountFrequency (%)
5198
50.0%
0198
50.0%
ValueCountFrequency (%)
t175
50.0%
ä175
50.0%
ValueCountFrequency (%)
/99
100.0%
ValueCountFrequency (%)
E175
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin525
51.5%
Common495
48.5%

Most frequent character per script

ValueCountFrequency (%)
5198
40.0%
0198
40.0%
/99
20.0%
ValueCountFrequency (%)
E175
33.3%
t175
33.3%
ä175
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII845
82.8%
None175
 
17.2%

Most frequent character per block

ValueCountFrequency (%)
5198
23.4%
0198
23.4%
E175
20.7%
t175
20.7%
/99
11.7%
ValueCountFrequency (%)
ä175
100.0%

Kuukausipalkka
Real number (ℝ≥0)

MISSING

Distinct117
Distinct (%)30.6%
Missing37
Missing (%)8.8%
Infinite0
Infinite (%)0.0%
Mean4662.848168
Minimum1666
Maximum15000
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2021-02-19T14:29:58.816235image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1666
5-th percentile2801
Q13812.5
median4500
Q35500
95-th percentile6948.2
Maximum15000
Range13334
Interquartile range (IQR)1687.5

Descriptive statistics

Standard deviation1311.622624
Coefficient of variation (CV)0.2812921582
Kurtosis9.4332742
Mean4662.848168
Median Absolute Deviation (MAD)750
Skewness1.516453002
Sum1781208
Variance1720353.909
MonotocityNot monotonic
2021-02-19T14:29:59.030625image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
400022
 
5.3%
450020
 
4.8%
600015
 
3.6%
500015
 
3.6%
550014
 
3.3%
480011
 
2.6%
700010
 
2.4%
420010
 
2.4%
430010
 
2.4%
38009
 
2.1%
Other values (107)246
58.7%
(Missing)37
 
8.8%
ValueCountFrequency (%)
16661
0.2%
17001
0.2%
18001
0.2%
21001
0.2%
22751
0.2%
ValueCountFrequency (%)
150001
 
0.2%
85001
 
0.2%
80004
1.0%
75002
0.5%
72001
 
0.2%

Vuositulot
Real number (ℝ≥0)

MISSING

Distinct168
Distinct (%)41.2%
Missing11
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean65191.95221
Minimum0
Maximum250000
Zeros2
Zeros (%)0.5%
Memory size3.4 KiB
2021-02-19T14:29:59.234905image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile35000
Q150000
median60000
Q375000
95-th percentile120000
Maximum250000
Range250000
Interquartile range (IQR)25000

Descriptive statistics

Standard deviation28777.86007
Coefficient of variation (CV)0.441432709
Kurtosis8.457139281
Mean65191.95221
Median Absolute Deviation (MAD)12000
Skewness2.191086173
Sum26598316.5
Variance828165230.2
MonotocityNot monotonic
2021-02-19T14:29:59.439114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5500016
 
3.8%
6000014
 
3.3%
7500014
 
3.3%
5000014
 
3.3%
6500010
 
2.4%
625009
 
2.1%
850009
 
2.1%
800009
 
2.1%
540008
 
1.9%
400008
 
1.9%
Other values (158)297
70.9%
(Missing)11
 
2.6%
ValueCountFrequency (%)
02
0.5%
40001
0.2%
61001
0.2%
75001
0.2%
200001
0.2%
ValueCountFrequency (%)
2500001
 
0.2%
2000003
0.7%
1900001
 
0.2%
1800001
 
0.2%
1550001
 
0.2%

Kilpailukykyinen
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.5%
Missing13
Missing (%)3.1%
Memory size3.4 KiB
True
285 
False
121 
(Missing)
 
13
ValueCountFrequency (%)
True285
68.0%
False121
28.9%
(Missing)13
 
3.1%
2021-02-19T14:29:59.579243image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Työpaikka
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct66
Distinct (%)69.5%
Missing324
Missing (%)77.3%
Memory size3.4 KiB
Gofore
11 
Vincit
 
6
Fraktio
 
4
Futurice
 
4
Arado
 
3
Other values (61)
67 

Length

Max length132
Median length7
Mean length10.67368421
Min length2

Characters and Unicode

Total characters1014
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique56 ?
Unique (%)58.9%

Sample

1st rowQuestrade
2nd rowDigia Oyj
3rd rowGofore
4th rowOura Health
5th rowWirepas
ValueCountFrequency (%)
Gofore11
 
2.6%
Vincit6
 
1.4%
Fraktio4
 
1.0%
Futurice4
 
1.0%
Arado3
 
0.7%
Pankki3
 
0.7%
Siili2
 
0.5%
Gofore Oyj2
 
0.5%
KVTES-alainen kunnan omistama oy 2
 
0.5%
Qvik2
 
0.5%
Other values (56)56
 
13.4%
(Missing)324
77.3%
2021-02-19T14:30:00.018824image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gofore13
 
8.6%
oy9
 
5.9%
vincit6
 
3.9%
oyj4
 
2.6%
futurice4
 
2.6%
fraktio4
 
2.6%
arado3
 
2.0%
siili3
 
2.0%
omistama3
 
2.0%
pankki3
 
2.0%
Other values (88)100
65.8%

Most occurring characters

ValueCountFrequency (%)
i101
 
10.0%
a79
 
7.8%
o79
 
7.8%
t71
 
7.0%
e70
 
6.9%
60
 
5.9%
r55
 
5.4%
n50
 
4.9%
l41
 
4.0%
u40
 
3.9%
Other values (43)368
36.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter821
81.0%
Uppercase Letter127
 
12.5%
Space Separator60
 
5.9%
Other Punctuation3
 
0.3%
Dash Punctuation3
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
G14
 
11.0%
O13
 
10.2%
V12
 
9.4%
S11
 
8.7%
F9
 
7.1%
A7
 
5.5%
K7
 
5.5%
C6
 
4.7%
P6
 
4.7%
E5
 
3.9%
Other values (15)37
29.1%
ValueCountFrequency (%)
i101
12.3%
a79
9.6%
o79
9.6%
t71
 
8.6%
e70
 
8.5%
r55
 
6.7%
n50
 
6.1%
l41
 
5.0%
u40
 
4.9%
k38
 
4.6%
Other values (15)197
24.0%
ValueCountFrequency (%)
60
100.0%
ValueCountFrequency (%)
.3
100.0%
ValueCountFrequency (%)
-3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin948
93.5%
Common66
 
6.5%

Most frequent character per script

ValueCountFrequency (%)
i101
 
10.7%
a79
 
8.3%
o79
 
8.3%
t71
 
7.5%
e70
 
7.4%
r55
 
5.8%
n50
 
5.3%
l41
 
4.3%
u40
 
4.2%
k38
 
4.0%
Other values (40)324
34.2%
ValueCountFrequency (%)
60
90.9%
.3
 
4.5%
-3
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1002
98.8%
None12
 
1.2%

Most frequent character per block

ValueCountFrequency (%)
i101
 
10.1%
a79
 
7.9%
o79
 
7.9%
t71
 
7.1%
e70
 
7.0%
60
 
6.0%
r55
 
5.5%
n50
 
5.0%
l41
 
4.1%
u40
 
4.0%
Other values (41)356
35.5%
ValueCountFrequency (%)
ä11
91.7%
ö1
 
8.3%

Vapaa sana
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct32
Distinct (%)97.0%
Missing386
Missing (%)92.1%
Memory size3.4 KiB
palkan lisänä lounas- ja virkistysetu
 
2
Rahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
 
1
Pakettiin kuuluu reilu määrä optioita ja palkka nousee (ja laskee) firman liikevaihdon myötä.
 
1
Startup
 
1
Korona-aika on lisännyt etätyön määrää. Aiemmin pari päivää viikossa etänä, nyt kokonaan. Paluuta vanhaan ei varmaankaan ole, ehkä päivä viikossa konttorilla ihan sosiaalisten kontaktien takia.
 
1
Other values (27)
27 

Length

Max length286
Median length71
Mean length95.54545455
Min length7

Characters and Unicode

Total characters3153
Distinct characters55
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)93.9%

Sample

1st rowKuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.
2nd rowTyöskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
3rd rowpalkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen.
4th rowRahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
5th rowOsittain laskutukseen perustuva palkka joten vaihtelee.
ValueCountFrequency (%)
palkan lisänä lounas- ja virkistysetu2
 
0.5%
Rahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu1
 
0.2%
Pakettiin kuuluu reilu määrä optioita ja palkka nousee (ja laskee) firman liikevaihdon myötä.1
 
0.2%
Startup1
 
0.2%
Korona-aika on lisännyt etätyön määrää. Aiemmin pari päivää viikossa etänä, nyt kokonaan. Paluuta vanhaan ei varmaankaan ole, ehkä päivä viikossa konttorilla ihan sosiaalisten kontaktien takia.1
 
0.2%
Osittain laskutukseen perustuva palkka joten vaihtelee.1
 
0.2%
Palkka perustuu osittain laskutukseen, joten vuositulot vaihtelevat hieman.1
 
0.2%
Palkka riippuu osittain firman tuloksesta, joten vaikea sanoa tarkkaan.1
 
0.2%
palkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen. 1
 
0.2%
Sijainti Pori, mutta etätöitä 100%. Varsinainen positio Tampere - Helsinki. Edut aika huonot, perusjutut. Työ itsessään aika masentavaa. Seuraavaksi varmaan freelance/yrittäjyys.1
 
0.2%
Other values (22)22
 
5.3%
(Missing)386
92.1%
2021-02-19T14:30:00.376698image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ei10
 
2.5%
palkka9
 
2.3%
on8
 
2.0%
ole6
 
1.5%
mutta6
 
1.5%
ja6
 
1.5%
joten4
 
1.0%
ihan4
 
1.0%
palkan4
 
1.0%
firman4
 
1.0%
Other values (281)334
84.6%

Most occurring characters

ValueCountFrequency (%)
365
11.6%
a331
10.5%
i271
 
8.6%
t247
 
7.8%
n216
 
6.9%
s205
 
6.5%
e203
 
6.4%
k185
 
5.9%
l161
 
5.1%
o146
 
4.6%
Other values (45)823
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2625
83.3%
Space Separator365
 
11.6%
Other Punctuation74
 
2.3%
Uppercase Letter47
 
1.5%
Decimal Number27
 
0.9%
Dash Punctuation6
 
0.2%
Open Punctuation3
 
0.1%
Close Punctuation3
 
0.1%
Math Symbol3
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
a331
12.6%
i271
10.3%
t247
9.4%
n216
 
8.2%
s205
 
7.8%
e203
 
7.7%
k185
 
7.0%
l161
 
6.1%
o146
 
5.6%
u118
 
4.5%
Other values (14)542
20.6%
ValueCountFrequency (%)
T7
14.9%
P7
14.9%
O6
12.8%
E6
12.8%
V6
12.8%
S4
8.5%
K3
6.4%
I2
 
4.3%
H2
 
4.3%
R1
 
2.1%
Other values (3)3
6.4%
ValueCountFrequency (%)
015
55.6%
13
 
11.1%
52
 
7.4%
22
 
7.4%
82
 
7.4%
62
 
7.4%
31
 
3.7%
ValueCountFrequency (%)
.38
51.4%
,23
31.1%
/5
 
6.8%
%4
 
5.4%
"2
 
2.7%
?2
 
2.7%
ValueCountFrequency (%)
365
100.0%
ValueCountFrequency (%)
(3
100.0%
ValueCountFrequency (%)
)3
100.0%
ValueCountFrequency (%)
+3
100.0%
ValueCountFrequency (%)
-6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2672
84.7%
Common481
 
15.3%

Most frequent character per script

ValueCountFrequency (%)
a331
12.4%
i271
10.1%
t247
9.2%
n216
 
8.1%
s205
 
7.7%
e203
 
7.6%
k185
 
6.9%
l161
 
6.0%
o146
 
5.5%
u118
 
4.4%
Other values (27)589
22.0%
ValueCountFrequency (%)
365
75.9%
.38
 
7.9%
,23
 
4.8%
015
 
3.1%
-6
 
1.2%
/5
 
1.0%
%4
 
0.8%
(3
 
0.6%
)3
 
0.6%
+3
 
0.6%
Other values (8)16
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3017
95.7%
None136
 
4.3%

Most frequent character per block

ValueCountFrequency (%)
365
12.1%
a331
11.0%
i271
 
9.0%
t247
 
8.2%
n216
 
7.2%
s205
 
6.8%
e203
 
6.7%
k185
 
6.1%
l161
 
5.3%
o146
 
4.8%
Other values (43)687
22.8%
ValueCountFrequency (%)
ä112
82.4%
ö24
 
17.6%

Interactions

2021-02-19T14:29:52.297503image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:29:52.480324image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:29:52.655331image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:29:52.829374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:29:53.000949image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:29:53.173493image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-02-19T14:30:00.512915image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-19T14:30:00.700859image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-19T14:30:00.889393image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-19T14:30:01.091779image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-19T14:29:53.487863image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-19T14:29:53.921125image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-19T14:29:54.419219image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-19T14:29:54.770524image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
02021-02-15 11:57:08.316PK-Seutu31-35 vNaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN
12021-02-15 11:57:19.676Turku31-35 vmies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN
22021-02-15 11:58:03.592PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN
32021-02-15 11:58:15.261Tampere31-35 vmies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN
42021-02-15 11:58:16.983PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN
52021-02-15 11:58:49.454PK-Seutu41-45 vmies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN8000.0100000.0TrueNaNNaN
62021-02-15 12:00:03.771PK-Seutu31-35 vmies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN
72021-02-15 12:00:04.655Tampere31-35 vNaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4250.054000.0TrueNaNNaN
82021-02-15 12:01:00.769Tampere31-35 vmies6.0Työntekijä / palkollinen1.0Lead developerNaN4000.050000.0FalseNaNNaN
92021-02-15 12:02:03.577Tallinna31-35 vmies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
4092021-02-19 14:54:21.221Tampere36-40 vNaN12.0Työntekijä / palkollinen1.0OhjelmistosuunnittelijaNaN3800.050000.0FalseNaNNaN
4102021-02-19 15:01:20.423Turku31-35 vmies9.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäNaN3900.052000.0FalseNaNNaN
4112021-02-19 15:06:06.295PK-Seutu36-40 vnainen14.0Työntekijä / palkollinen1.0Senior consultantEtä8500.0100000.0TrueSulavaNaN
4122021-02-19 15:13:51.743Pori36-40 vmies8.0Työntekijä / palkollinen1.0Tech LeadEtä5080.065000.0FalseIso konsulttitaloSijainti Pori, mutta etätöitä 100%. Varsinainen positio Tampere - Helsinki. Edut aika huonot, perusjutut. Työ itsessään aika masentavaa. Seuraavaksi varmaan freelance/yrittäjyys.
4132021-02-19 15:24:01.085Tampere36-40 vmies14.0Työntekijä / palkollinen1.0OhjelmistotestaajaEtä4100.055000.0TrueNaNNaN
4142021-02-19 15:34:53.741Tampere26-30 vmuu7.0Työntekijä / palkollinen1.0Full-stack developer50/505550.069400.0TrueNaNNaN
4152021-02-19 15:40:16.336PK-Seutu26-30 vmies5.0Työntekijä / palkollinen0.8Full-stack/mobiili/designEtä7000.090000.0TrueMavericksNaN
4162021-02-19 16:04:50.348Tampere36-40 vmies16.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4800.065000.0TrueNaNBonukset riippuu firman tuloksesta. Palkka olisi varmastikin enemmän muualla mutta uskoakseni linjassa kollegoideni kanssa.
4172021-02-19 16:17:29.891PK-Seutu36-40 vnainen8.0Työntekijä / palkollinenNaNProduct Owner50/504500.056200.0TrueNaNNaN
4182021-02-19 16:26:32.700PK-Seutu36-40 vmies16.0Työntekijä / palkollinen1.0Mobile SWEtä8000.095000.0TrueMavericksNaN