Overview

Dataset statistics

Number of variables14
Number of observations417
Missing cells968
Missing cells (%)16.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory35.7 KiB
Average record size in memory87.6 B

Variable types

DateTime1
Categorical8
Numeric2
Unsupported2
Boolean1

Warnings

Rooli has a high cardinality: 223 distinct values High cardinality
Työpaikka has a high cardinality: 65 distinct values High cardinality
Kilpailukykyinen is highly correlated with Vapaa sanaHigh correlation
Vapaa sana is highly correlated with Kilpailukykyinen and 1 other fieldsHigh correlation
Työpaikka is highly correlated with Vapaa sanaHigh correlation
Sukupuoli has 32 (7.7%) missing values Missing
Työaika has 15 (3.6%) missing values Missing
Rooli has 10 (2.4%) missing values Missing
Etä has 145 (34.8%) missing values Missing
Kuukausipalkka has 35 (8.4%) missing values Missing
Kilpailukykyinen has 13 (3.1%) missing values Missing
Työpaikka has 323 (77.5%) missing values Missing
Vapaa sana has 384 (92.1%) missing values Missing
Työaika is highly skewed (γ1 = 20.02126569) Skewed
Vapaa sana is uniformly distributed Uniform
Timestamp has unique values Unique
Kuukausipalkka is an unsupported type, check if it needs cleaning or further analysis Unsupported
Vuositulot is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2021-02-19 14:12:34.841689
Analysis finished2021-02-19 14:12:38.045552
Duration3.2 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

Timestamp
Date

UNIQUE

Distinct417
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
Minimum2021-02-15 11:57:08.316000
Maximum2021-02-19 16:04:50.348000
2021-02-19T14:12:38.147434image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:12:38.357265image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Kaupunki
Categorical

Distinct25
Distinct (%)6.1%
Missing4
Missing (%)1.0%
Memory size1.3 KiB
PK-Seutu
210 
Tampere
97 
Turku
42 
Oulu
22 
Jyväskylä
 
17
Other values (20)
25 

Length

Max length15
Median length8
Mean length7.249394673
Min length2

Characters and Unicode

Total characters2994
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)4.4%

Sample

1st rowPK-Seutu
2nd rowTurku
3rd rowPK-Seutu
4th rowTampere
5th rowPK-Seutu
ValueCountFrequency (%)
PK-Seutu210
50.4%
Tampere97
23.3%
Turku42
 
10.1%
Oulu22
 
5.3%
Jyväskylä17
 
4.1%
Kuopio5
 
1.2%
Pori2
 
0.5%
Ruotsi1
 
0.2%
Wien1
 
0.2%
Viimsi1
 
0.2%
Other values (15)15
 
3.6%
(Missing)4
 
1.0%
2021-02-19T14:12:38.810103image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pk-seutu210
50.4%
tampere97
23.3%
turku42
 
10.1%
oulu22
 
5.3%
jyväskylä17
 
4.1%
kuopio5
 
1.2%
pori2
 
0.5%
new1
 
0.2%
viimsi1
 
0.2%
francisco1
 
0.2%
Other values (19)19
 
4.6%

Most occurring characters

ValueCountFrequency (%)
u560
18.7%
e413
13.8%
K218
 
7.3%
t215
 
7.2%
P213
 
7.1%
-212
 
7.1%
S212
 
7.1%
r145
 
4.8%
T140
 
4.7%
a114
 
3.8%
Other values (29)552
18.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1939
64.8%
Uppercase Letter838
28.0%
Dash Punctuation212
 
7.1%
Space Separator4
 
0.1%
Other Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
u560
28.9%
e413
21.3%
t215
 
11.1%
r145
 
7.5%
a114
 
5.9%
p103
 
5.3%
m101
 
5.2%
k60
 
3.1%
l47
 
2.4%
ä41
 
2.1%
Other values (10)140
 
7.2%
ValueCountFrequency (%)
K218
26.0%
P213
25.4%
S212
25.3%
T140
16.7%
O22
 
2.6%
J18
 
2.1%
E3
 
0.4%
L3
 
0.4%
V2
 
0.2%
W1
 
0.1%
Other values (6)6
 
0.7%
ValueCountFrequency (%)
-212
100.0%
ValueCountFrequency (%)
4
100.0%
ValueCountFrequency (%)
,1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2777
92.8%
Common217
 
7.2%

Most frequent character per script

ValueCountFrequency (%)
u560
20.2%
e413
14.9%
K218
 
7.9%
t215
 
7.7%
P213
 
7.7%
S212
 
7.6%
r145
 
5.2%
T140
 
5.0%
a114
 
4.1%
p103
 
3.7%
Other values (26)444
16.0%
ValueCountFrequency (%)
-212
97.7%
4
 
1.8%
,1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2953
98.6%
None41
 
1.4%

Most frequent character per block

ValueCountFrequency (%)
u560
19.0%
e413
14.0%
K218
 
7.4%
t215
 
7.3%
P213
 
7.2%
-212
 
7.2%
S212
 
7.2%
r145
 
4.9%
T140
 
4.7%
a114
 
3.9%
Other values (28)511
17.3%
ValueCountFrequency (%)
ä41
100.0%

Ikä
Categorical

Distinct7
Distinct (%)1.7%
Missing2
Missing (%)0.5%
Memory size901.0 B
31-35 v
139 
26-30 v
104 
36-40 v
89 
41-45 v
45 
21-25 v
26 
Other values (2)
 
12

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters2905
Distinct characters10
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row31-35 v
2nd row31-35 v
3rd row26-30 v
4th row31-35 v
5th row26-30 v
ValueCountFrequency (%)
31-35 v139
33.3%
26-30 v104
24.9%
36-40 v89
21.3%
41-45 v45
 
10.8%
21-25 v26
 
6.2%
46-50 v7
 
1.7%
51-55 v5
 
1.2%
(Missing)2
 
0.5%
2021-02-19T14:12:39.183800image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:12:39.301886image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
v415
50.0%
31-35139
 
16.7%
26-30104
 
12.5%
36-4089
 
10.7%
41-4545
 
5.4%
21-2526
 
3.1%
46-507
 
0.8%
51-555
 
0.6%

Most occurring characters

ValueCountFrequency (%)
3471
16.2%
-415
14.3%
415
14.3%
v415
14.3%
5232
8.0%
1215
7.4%
6200
6.9%
0200
6.9%
4186
 
6.4%
2156
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1660
57.1%
Dash Punctuation415
 
14.3%
Space Separator415
 
14.3%
Lowercase Letter415
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
3471
28.4%
5232
14.0%
1215
13.0%
6200
12.0%
0200
12.0%
4186
 
11.2%
2156
 
9.4%
ValueCountFrequency (%)
-415
100.0%
ValueCountFrequency (%)
415
100.0%
ValueCountFrequency (%)
v415
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2490
85.7%
Latin415
 
14.3%

Most frequent character per script

ValueCountFrequency (%)
3471
18.9%
-415
16.7%
415
16.7%
5232
9.3%
1215
8.6%
6200
8.0%
0200
8.0%
4186
 
7.5%
2156
 
6.3%
ValueCountFrequency (%)
v415
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2905
100.0%

Most frequent character per block

ValueCountFrequency (%)
3471
16.2%
-415
14.3%
415
14.3%
v415
14.3%
5232
8.0%
1215
7.4%
6200
6.9%
0200
6.9%
4186
 
6.4%
2156
 
5.4%

Sukupuoli
Categorical

MISSING

Distinct3
Distinct (%)0.8%
Missing32
Missing (%)7.7%
Memory size677.0 B
mies
350 
nainen
 
27
muu
 
8

Length

Max length6
Median length4
Mean length4.119480519
Min length3

Characters and Unicode

Total characters1586
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmies
2nd rowmies
3rd rowmies
4th rowmies
5th rowmies
ValueCountFrequency (%)
mies350
83.9%
nainen27
 
6.5%
muu8
 
1.9%
(Missing)32
 
7.7%
2021-02-19T14:12:39.673565image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:12:39.790459image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
mies350
90.9%
nainen27
 
7.0%
muu8
 
2.1%

Most occurring characters

ValueCountFrequency (%)
i377
23.8%
e377
23.8%
m358
22.6%
s350
22.1%
n81
 
5.1%
a27
 
1.7%
u16
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1586
100.0%

Most frequent character per category

ValueCountFrequency (%)
i377
23.8%
e377
23.8%
m358
22.6%
s350
22.1%
n81
 
5.1%
a27
 
1.7%
u16
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1586
100.0%

Most frequent character per script

ValueCountFrequency (%)
i377
23.8%
e377
23.8%
m358
22.6%
s350
22.1%
n81
 
5.1%
a27
 
1.7%
u16
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1586
100.0%

Most frequent character per block

ValueCountFrequency (%)
i377
23.8%
e377
23.8%
m358
22.6%
s350
22.1%
n81
 
5.1%
a27
 
1.7%
u16
 
1.0%

Työkokemus
Real number (ℝ≥0)

Distinct27
Distinct (%)6.5%
Missing4
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean9.595641646
Minimum0
Maximum30
Zeros3
Zeros (%)0.7%
Memory size3.4 KiB
2021-02-19T14:12:39.917578image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median9
Q313
95-th percentile21
Maximum30
Range30
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.061871958
Coefficient of variation (CV)0.6317317988
Kurtosis-0.005108014079
Mean9.595641646
Median Absolute Deviation (MAD)4
Skewness0.7276936771
Sum3963
Variance36.74629164
MonotocityNot monotonic
2021-02-19T14:12:40.083823image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
543
 
10.3%
1034
 
8.2%
428
 
6.7%
726
 
6.2%
2025
 
6.0%
324
 
5.8%
1523
 
5.5%
1322
 
5.3%
621
 
5.0%
1121
 
5.0%
Other values (17)146
35.0%
ValueCountFrequency (%)
03
 
0.7%
115
3.6%
221
5.0%
324
5.8%
428
6.7%
ValueCountFrequency (%)
302
 
0.5%
255
1.2%
242
 
0.5%
234
1.0%
223
0.7%
Distinct3
Distinct (%)0.7%
Missing1
Missing (%)0.2%
Memory size3.4 KiB
Työntekijä / palkollinen
371 
Freelancer
 
23
Yrittäjä
 
22

Length

Max length24
Median length24
Mean length22.37980769
Min length8

Characters and Unicode

Total characters9310
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTyöntekijä / palkollinen
2nd rowTyöntekijä / palkollinen
3rd rowTyöntekijä / palkollinen
4th rowYrittäjä
5th rowTyöntekijä / palkollinen
ValueCountFrequency (%)
Työntekijä / palkollinen371
89.0%
Freelancer23
 
5.5%
Yrittäjä22
 
5.3%
(Missing)1
 
0.2%
2021-02-19T14:12:40.468309image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:12:40.596471image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
371
32.0%
palkollinen371
32.0%
työntekijä371
32.0%
freelancer23
 
2.0%
yrittäjä22
 
1.9%

Most occurring characters

ValueCountFrequency (%)
n1136
12.2%
l1136
12.2%
e811
 
8.7%
i764
 
8.2%
k742
 
8.0%
742
 
8.0%
t415
 
4.5%
ä415
 
4.5%
a394
 
4.2%
j393
 
4.2%
Other values (10)2362
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7781
83.6%
Space Separator742
 
8.0%
Uppercase Letter416
 
4.5%
Other Punctuation371
 
4.0%

Most frequent character per category

ValueCountFrequency (%)
n1136
14.6%
l1136
14.6%
e811
10.4%
i764
9.8%
k742
9.5%
t415
 
5.3%
ä415
 
5.3%
a394
 
5.1%
j393
 
5.1%
y371
 
4.8%
Other values (5)1204
15.5%
ValueCountFrequency (%)
T371
89.2%
F23
 
5.5%
Y22
 
5.3%
ValueCountFrequency (%)
742
100.0%
ValueCountFrequency (%)
/371
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8197
88.0%
Common1113
 
12.0%

Most frequent character per script

ValueCountFrequency (%)
n1136
13.9%
l1136
13.9%
e811
9.9%
i764
9.3%
k742
9.1%
t415
 
5.1%
ä415
 
5.1%
a394
 
4.8%
j393
 
4.8%
T371
 
4.5%
Other values (8)1620
19.8%
ValueCountFrequency (%)
742
66.7%
/371
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8524
91.6%
None786
 
8.4%

Most frequent character per block

ValueCountFrequency (%)
n1136
13.3%
l1136
13.3%
e811
9.5%
i764
9.0%
k742
8.7%
742
8.7%
t415
 
4.9%
a394
 
4.6%
j393
 
4.6%
T371
 
4.4%
Other values (8)1620
19.0%
ValueCountFrequency (%)
ä415
52.8%
ö371
47.2%

Työaika
Real number (ℝ≥0)

MISSING
SKEWED

Distinct6
Distinct (%)1.5%
Missing15
Missing (%)3.6%
Infinite0
Infinite (%)0.0%
Mean1.083333333
Minimum0.5
Maximum40
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB
2021-02-19T14:12:40.718234image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.5
5-th percentile0.8
Q11
median1
Q31
95-th percentile1
Maximum40
Range39.5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.946746332
Coefficient of variation (CV)1.796996614
Kurtosis401.235778
Mean1.083333333
Median Absolute Deviation (MAD)0
Skewness20.02126569
Sum435.5
Variance3.78982128
MonotocityNot monotonic
2021-02-19T14:12:40.875233image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1378
90.6%
0.819
 
4.6%
0.52
 
0.5%
401
 
0.2%
0.61
 
0.2%
0.71
 
0.2%
(Missing)15
 
3.6%
ValueCountFrequency (%)
0.52
 
0.5%
0.61
 
0.2%
0.71
 
0.2%
0.819
 
4.6%
1378
90.6%
ValueCountFrequency (%)
401
 
0.2%
1378
90.6%
0.819
 
4.6%
0.71
 
0.2%
0.61
 
0.2%

Rooli
Categorical

HIGH CARDINALITY
MISSING

Distinct223
Distinct (%)54.8%
Missing10
Missing (%)2.4%
Memory size3.4 KiB
Ohjelmistokehittäjä
33 
full-stack
 
28
Full-stack
 
21
Arkkitehti
 
15
ohjelmistokehittäjä
 
14
Other values (218)
296 

Length

Max length67
Median length18
Mean length19.22113022
Min length2

Characters and Unicode

Total characters7823
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique181 ?
Unique (%)44.5%

Sample

1st rowArkkitehti
2nd rowfull-stack
3rd rowFull-stack ohjelmistokehittäjä
4th rowweb-arkkitehti
5th rowOhjelmistokehittäjä
ValueCountFrequency (%)
Ohjelmistokehittäjä33
 
7.9%
full-stack28
 
6.7%
Full-stack21
 
5.0%
Arkkitehti15
 
3.6%
ohjelmistokehittäjä14
 
3.4%
Full-stack ohjelmistokehittäjä8
 
1.9%
full-stack ohjelmistokehittäjä6
 
1.4%
arkkitehti6
 
1.4%
DevOps5
 
1.2%
Frontend5
 
1.2%
Other values (213)266
63.8%
(Missing)10
 
2.4%
2021-02-19T14:12:41.494423image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
full-stack125
 
16.7%
ohjelmistokehittäjä97
 
13.0%
developer52
 
7.0%
arkkitehti33
 
4.4%
29
 
3.9%
lead27
 
3.6%
frontend23
 
3.1%
senior18
 
2.4%
kehittäjä15
 
2.0%
backend13
 
1.7%
Other values (164)316
42.2%

Most occurring characters

ValueCountFrequency (%)
t818
 
10.5%
e717
 
9.2%
i577
 
7.4%
l575
 
7.4%
k441
 
5.6%
o408
 
5.2%
s373
 
4.8%
a373
 
4.8%
345
 
4.4%
h314
 
4.0%
Other values (47)2882
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6805
87.0%
Uppercase Letter383
 
4.9%
Space Separator346
 
4.4%
Dash Punctuation150
 
1.9%
Other Punctuation85
 
1.1%
Open Punctuation23
 
0.3%
Close Punctuation23
 
0.3%
Math Symbol8
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
t818
12.0%
e717
 
10.5%
i577
 
8.5%
l575
 
8.4%
k441
 
6.5%
o408
 
6.0%
s373
 
5.5%
a373
 
5.5%
h314
 
4.6%
j294
 
4.3%
Other values (16)1915
28.1%
ValueCountFrequency (%)
F88
23.0%
O79
20.6%
S43
11.2%
D38
9.9%
A24
 
6.3%
T20
 
5.2%
L16
 
4.2%
C13
 
3.4%
E10
 
2.6%
K8
 
2.1%
Other values (11)44
11.5%
ValueCountFrequency (%)
,50
58.8%
/31
36.5%
&3
 
3.5%
.1
 
1.2%
ValueCountFrequency (%)
345
99.7%
 1
 
0.3%
ValueCountFrequency (%)
-150
100.0%
ValueCountFrequency (%)
(23
100.0%
ValueCountFrequency (%)
)23
100.0%
ValueCountFrequency (%)
+8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7188
91.9%
Common635
 
8.1%

Most frequent character per script

ValueCountFrequency (%)
t818
 
11.4%
e717
 
10.0%
i577
 
8.0%
l575
 
8.0%
k441
 
6.1%
o408
 
5.7%
s373
 
5.2%
a373
 
5.2%
h314
 
4.4%
j294
 
4.1%
Other values (37)2298
32.0%
ValueCountFrequency (%)
345
54.3%
-150
23.6%
,50
 
7.9%
/31
 
4.9%
(23
 
3.6%
)23
 
3.6%
+8
 
1.3%
&3
 
0.5%
.1
 
0.2%
 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7527
96.2%
None296
 
3.8%

Most frequent character per block

ValueCountFrequency (%)
t818
 
10.9%
e717
 
9.5%
i577
 
7.7%
l575
 
7.6%
k441
 
5.9%
o408
 
5.4%
s373
 
5.0%
a373
 
5.0%
345
 
4.6%
h314
 
4.2%
Other values (44)2586
34.4%
ValueCountFrequency (%)
ä280
94.6%
ö15
 
5.1%
 1
 
0.3%

Etä
Categorical

MISSING

Distinct2
Distinct (%)0.7%
Missing145
Missing (%)34.8%
Memory size669.0 B
Etä
174 
50/50
98 

Length

Max length5
Median length3
Mean length3.720588235
Min length3

Characters and Unicode

Total characters1012
Distinct characters6
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50/50
2nd rowEtä
3rd rowEtä
4th rowEtä
5th rowEtä
ValueCountFrequency (%)
Etä174
41.7%
50/5098
23.5%
(Missing)145
34.8%
2021-02-19T14:12:41.891325image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T14:12:42.022482image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
etä174
64.0%
50/5098
36.0%

Most occurring characters

ValueCountFrequency (%)
5196
19.4%
0196
19.4%
E174
17.2%
t174
17.2%
ä174
17.2%
/98
9.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number392
38.7%
Lowercase Letter348
34.4%
Uppercase Letter174
17.2%
Other Punctuation98
 
9.7%

Most frequent character per category

ValueCountFrequency (%)
5196
50.0%
0196
50.0%
ValueCountFrequency (%)
t174
50.0%
ä174
50.0%
ValueCountFrequency (%)
/98
100.0%
ValueCountFrequency (%)
E174
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin522
51.6%
Common490
48.4%

Most frequent character per script

ValueCountFrequency (%)
5196
40.0%
0196
40.0%
/98
20.0%
ValueCountFrequency (%)
E174
33.3%
t174
33.3%
ä174
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII838
82.8%
None174
 
17.2%

Most frequent character per block

ValueCountFrequency (%)
5196
23.4%
0196
23.4%
E174
20.8%
t174
20.8%
/98
11.7%
ValueCountFrequency (%)
ä174
100.0%

Kuukausipalkka
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing35
Missing (%)8.4%
Memory size3.4 KiB

Vuositulot
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size3.4 KiB

Kilpailukykyinen
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.5%
Missing13
Missing (%)3.1%
Memory size3.4 KiB
True
283 
False
121 
(Missing)
 
13
ValueCountFrequency (%)
True283
67.9%
False121
29.0%
(Missing)13
 
3.1%
2021-02-19T14:12:42.085268image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Työpaikka
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct65
Distinct (%)69.1%
Missing323
Missing (%)77.5%
Memory size3.4 KiB
Gofore
11 
Vincit
 
6
Fraktio
 
4
Futurice
 
4
Arado
 
3
Other values (60)
66 

Length

Max length132
Median length7
Mean length10.69148936
Min length2

Characters and Unicode

Total characters1005
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)58.5%

Sample

1st rowQuestrade
2nd rowDigia Oyj
3rd rowGofore
4th rowOura Health
5th rowWirepas
ValueCountFrequency (%)
Gofore11
 
2.6%
Vincit6
 
1.4%
Fraktio4
 
1.0%
Futurice4
 
1.0%
Arado3
 
0.7%
Pankki3
 
0.7%
KVTES-alainen kunnan omistama oy 2
 
0.5%
Gofore Oyj2
 
0.5%
Qvik2
 
0.5%
Siili2
 
0.5%
Other values (55)55
 
13.2%
(Missing)323
77.5%
2021-02-19T14:12:42.437853image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gofore13
 
8.6%
oy9
 
6.0%
vincit6
 
4.0%
oyj4
 
2.6%
fraktio4
 
2.6%
futurice4
 
2.6%
omistama3
 
2.0%
siili3
 
2.0%
arado3
 
2.0%
pankki3
 
2.0%
Other values (88)99
65.6%

Most occurring characters

ValueCountFrequency (%)
i100
 
10.0%
o79
 
7.9%
a78
 
7.8%
t71
 
7.1%
e69
 
6.9%
60
 
6.0%
r54
 
5.4%
n50
 
5.0%
l41
 
4.1%
u40
 
4.0%
Other values (43)363
36.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter813
80.9%
Uppercase Letter126
 
12.5%
Space Separator60
 
6.0%
Other Punctuation3
 
0.3%
Dash Punctuation3
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
G14
 
11.1%
O13
 
10.3%
V12
 
9.5%
S11
 
8.7%
F9
 
7.1%
A7
 
5.6%
K7
 
5.6%
C6
 
4.8%
P6
 
4.8%
E5
 
4.0%
Other values (15)36
28.6%
ValueCountFrequency (%)
i100
12.3%
o79
9.7%
a78
9.6%
t71
 
8.7%
e69
 
8.5%
r54
 
6.6%
n50
 
6.2%
l41
 
5.0%
u40
 
4.9%
k37
 
4.6%
Other values (15)194
23.9%
ValueCountFrequency (%)
60
100.0%
ValueCountFrequency (%)
.3
100.0%
ValueCountFrequency (%)
-3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin939
93.4%
Common66
 
6.6%

Most frequent character per script

ValueCountFrequency (%)
i100
 
10.6%
o79
 
8.4%
a78
 
8.3%
t71
 
7.6%
e69
 
7.3%
r54
 
5.8%
n50
 
5.3%
l41
 
4.4%
u40
 
4.3%
k37
 
3.9%
Other values (40)320
34.1%
ValueCountFrequency (%)
60
90.9%
.3
 
4.5%
-3
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII993
98.8%
None12
 
1.2%

Most frequent character per block

ValueCountFrequency (%)
i100
 
10.1%
o79
 
8.0%
a78
 
7.9%
t71
 
7.2%
e69
 
6.9%
60
 
6.0%
r54
 
5.4%
n50
 
5.0%
l41
 
4.1%
u40
 
4.0%
Other values (41)351
35.3%
ValueCountFrequency (%)
ä11
91.7%
ö1
 
8.3%

Vapaa sana
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct32
Distinct (%)97.0%
Missing384
Missing (%)92.1%
Memory size3.4 KiB
palkan lisänä lounas- ja virkistysetu
 
2
Korona-aika on lisännyt etätyön määrää. Aiemmin pari päivää viikossa etänä, nyt kokonaan. Paluuta vanhaan ei varmaankaan ole, ehkä päivä viikossa konttorilla ihan sosiaalisten kontaktien takia.
 
1
Sijainti Pori, mutta etätöitä 100%. Varsinainen positio Tampere - Helsinki. Edut aika huonot, perusjutut. Työ itsessään aika masentavaa. Seuraavaksi varmaan freelance/yrittäjyys.
 
1
Pakettiin kuuluu reilu määrä optioita ja palkka nousee (ja laskee) firman liikevaihdon myötä.
 
1
Opiskelija
 
1
Other values (27)
27 

Length

Max length286
Median length71
Mean length95.54545455
Min length7

Characters and Unicode

Total characters3153
Distinct characters55
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)93.9%

Sample

1st rowKuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.
2nd rowTyöskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
3rd rowpalkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen.
4th rowRahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
5th rowOsittain laskutukseen perustuva palkka joten vaihtelee.
ValueCountFrequency (%)
palkan lisänä lounas- ja virkistysetu2
 
0.5%
Korona-aika on lisännyt etätyön määrää. Aiemmin pari päivää viikossa etänä, nyt kokonaan. Paluuta vanhaan ei varmaankaan ole, ehkä päivä viikossa konttorilla ihan sosiaalisten kontaktien takia.1
 
0.2%
Sijainti Pori, mutta etätöitä 100%. Varsinainen positio Tampere - Helsinki. Edut aika huonot, perusjutut. Työ itsessään aika masentavaa. Seuraavaksi varmaan freelance/yrittäjyys.1
 
0.2%
Pakettiin kuuluu reilu määrä optioita ja palkka nousee (ja laskee) firman liikevaihdon myötä.1
 
0.2%
Opiskelija1
 
0.2%
Ennen koronaa oli osittainen etätyö, koronan jälkeen 100%1
 
0.2%
startup, palkan lisäksi optiopaketti.1
 
0.2%
olen sekä päivätyöläinen että friikku. jospa nyt kuitenki vois valita monta?1
 
0.2%
Osittain laskutukseen perustuva palkka joten vaihtelee.1
 
0.2%
Teen 80% työaikaa jotta ehtisin harrastaa kaikenlaista työnteon lisäksi1
 
0.2%
Other values (22)22
 
5.3%
(Missing)384
92.1%
2021-02-19T14:12:42.852178image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ei10
 
2.5%
palkka9
 
2.3%
on8
 
2.0%
mutta6
 
1.5%
ole6
 
1.5%
ja6
 
1.5%
firman4
 
1.0%
palkan4
 
1.0%
joten4
 
1.0%
nyt4
 
1.0%
Other values (281)334
84.6%

Most occurring characters

ValueCountFrequency (%)
365
11.6%
a331
10.5%
i271
 
8.6%
t247
 
7.8%
n216
 
6.9%
s205
 
6.5%
e203
 
6.4%
k185
 
5.9%
l161
 
5.1%
o146
 
4.6%
Other values (45)823
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2625
83.3%
Space Separator365
 
11.6%
Other Punctuation74
 
2.3%
Uppercase Letter47
 
1.5%
Decimal Number27
 
0.9%
Dash Punctuation6
 
0.2%
Open Punctuation3
 
0.1%
Close Punctuation3
 
0.1%
Math Symbol3
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
a331
12.6%
i271
10.3%
t247
9.4%
n216
 
8.2%
s205
 
7.8%
e203
 
7.7%
k185
 
7.0%
l161
 
6.1%
o146
 
5.6%
u118
 
4.5%
Other values (14)542
20.6%
ValueCountFrequency (%)
T7
14.9%
P7
14.9%
O6
12.8%
E6
12.8%
V6
12.8%
S4
8.5%
K3
6.4%
I2
 
4.3%
H2
 
4.3%
R1
 
2.1%
Other values (3)3
6.4%
ValueCountFrequency (%)
015
55.6%
13
 
11.1%
52
 
7.4%
22
 
7.4%
82
 
7.4%
62
 
7.4%
31
 
3.7%
ValueCountFrequency (%)
.38
51.4%
,23
31.1%
/5
 
6.8%
%4
 
5.4%
"2
 
2.7%
?2
 
2.7%
ValueCountFrequency (%)
365
100.0%
ValueCountFrequency (%)
(3
100.0%
ValueCountFrequency (%)
)3
100.0%
ValueCountFrequency (%)
+3
100.0%
ValueCountFrequency (%)
-6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2672
84.7%
Common481
 
15.3%

Most frequent character per script

ValueCountFrequency (%)
a331
12.4%
i271
10.1%
t247
9.2%
n216
 
8.1%
s205
 
7.7%
e203
 
7.6%
k185
 
6.9%
l161
 
6.0%
o146
 
5.5%
u118
 
4.4%
Other values (27)589
22.0%
ValueCountFrequency (%)
365
75.9%
.38
 
7.9%
,23
 
4.8%
015
 
3.1%
-6
 
1.2%
/5
 
1.0%
%4
 
0.8%
(3
 
0.6%
)3
 
0.6%
+3
 
0.6%
Other values (8)16
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3017
95.7%
None136
 
4.3%

Most frequent character per block

ValueCountFrequency (%)
365
12.1%
a331
11.0%
i271
 
9.0%
t247
 
8.2%
n216
 
7.2%
s205
 
6.8%
e203
 
6.7%
k185
 
6.1%
l161
 
5.3%
o146
 
4.8%
Other values (43)687
22.8%
ValueCountFrequency (%)
ä112
82.4%
ö24
 
17.6%

Interactions

2021-02-19T14:12:36.096217image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T14:12:36.289594image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-02-19T14:12:43.009140image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-19T14:12:43.182563image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-19T14:12:43.350430image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-19T14:12:43.544078image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-19T14:12:36.626375image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-19T14:12:37.042475image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-19T14:12:37.443027image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-19T14:12:37.764009image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
02021-02-15 11:57:08.316PK-Seutu31-35 vNaN10.0Työntekijä / palkollinen1.0Arkkitehti50/50650083000TrueNaNNaN
12021-02-15 11:57:19.676Turku31-35 vmies14.0Työntekijä / palkollinen1.0full-stackEtä500062500TrueNaNNaN
22021-02-15 11:58:03.592PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä247530000FalseNaNNaN
32021-02-15 11:58:15.261Tampere31-35 vmies22.0Yrittäjä1.0web-arkkitehtiEtä4300100000TrueNaNNaN
42021-02-15 11:58:16.983PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä300037500FalseNaNNaN
52021-02-15 11:58:49.454PK-Seutu41-45 vmies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN8000100000TrueNaNNaN
62021-02-15 12:00:03.771PK-Seutu31-35 vmies10.0Freelancer1.0OhjelmistokehittäjäEtä6000140000TrueNaNNaN
72021-02-15 12:00:04.655Tampere31-35 vNaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN425054000TrueNaNNaN
82021-02-15 12:01:00.769Tampere31-35 vmies6.0Työntekijä / palkollinen1.0Lead developerNaN400050000FalseNaNNaN
92021-02-15 12:02:03.577Tallinna31-35 vmies12.0Freelancer1.0NaNEtäNaN200000TrueQuestradeNaN

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
4072021-02-19 14:44:18.231PK-Seutu31-35 vNaN5.0Työntekijä / palkollinen1.0full-stackEtä290036000FalseNaNNaN
4082021-02-19 14:48:10.772Viimsi36-40 vmies20.0YrittäjäNaNsysadminEtäNaN110000TrueNaNNaN
4092021-02-19 14:54:21.221Tampere36-40 vNaN12.0Työntekijä / palkollinen1.0OhjelmistosuunnittelijaNaN380050000FalseNaNNaN
4102021-02-19 15:01:20.423Turku31-35 vmies9.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäNaN390052000FalseNaNNaN
4112021-02-19 15:06:06.295PK-Seutu36-40 vnainen14.0Työntekijä / palkollinen1.0Senior consultantEtä8500100000TrueSulavaNaN
4122021-02-19 15:13:51.743Pori36-40 vmies8.0Työntekijä / palkollinen1.0Tech LeadEtä508065000FalseIso konsulttitaloSijainti Pori, mutta etätöitä 100%. Varsinainen positio Tampere - Helsinki. Edut aika huonot, perusjutut. Työ itsessään aika masentavaa. Seuraavaksi varmaan freelance/yrittäjyys.
4132021-02-19 15:24:01.085Tampere36-40 vmies14.0Työntekijä / palkollinen1.0OhjelmistotestaajaEtä410055000TrueNaNNaN
4142021-02-19 15:34:53.741Tampere26-30 vmuu7.0Työntekijä / palkollinen1.0Full-stack developer50/50555069400TrueNaNNaN
4152021-02-19 15:40:16.336PK-Seutu26-30 vmies5.0Työntekijä / palkollinen0.8Full-stack/mobiili/designEtä700090000TrueMavericksNaN
4162021-02-19 16:04:50.348Tampere36-40 vmies16.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN480065000TrueNaNBonukset riippuu firman tuloksesta. Palkka olisi varmastikin enemmän muualla mutta uskoakseni linjassa kollegoideni kanssa.