Overview

Dataset statistics

Number of variables14
Number of observations428
Missing cells1003
Missing cells (%)16.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory36.6 KiB
Average record size in memory87.5 B

Variable types

DateTime1
Categorical9
Numeric3
Boolean1

Warnings

Rooli has a high cardinality: 228 distinct values High cardinality
Työpaikka has a high cardinality: 67 distinct values High cardinality
Työpaikka is highly correlated with Vapaa sanaHigh correlation
Vapaa sana is highly correlated with Työpaikka and 1 other fieldsHigh correlation
Kilpailukykyinen is highly correlated with Vapaa sanaHigh correlation
Sukupuoli has 32 (7.5%) missing values Missing
Työaika has 16 (3.7%) missing values Missing
Rooli has 10 (2.3%) missing values Missing
Etä has 146 (34.1%) missing values Missing
Kuukausipalkka has 38 (8.9%) missing values Missing
Vuositulot has 11 (2.6%) missing values Missing
Kilpailukykyinen has 14 (3.3%) missing values Missing
Työpaikka has 330 (77.1%) missing values Missing
Vapaa sana has 395 (92.3%) missing values Missing
Vapaa sana is uniformly distributed Uniform
Timestamp has unique values Unique

Reproduction

Analysis started2021-02-19 15:58:09.896543
Analysis finished2021-02-19 15:58:13.907467
Duration4.01 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

Timestamp
Date

UNIQUE

Distinct428
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
Minimum2021-02-15 11:57:08.316000
Maximum2021-02-19 17:51:37.178000
2021-02-19T15:58:14.006510image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T15:58:14.213458image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Kaupunki
Categorical

Distinct25
Distinct (%)5.9%
Missing4
Missing (%)0.9%
Memory size1.3 KiB
PK-Seutu
218 
Tampere
99 
Turku
43 
Oulu
22 
Jyväskylä
 
17
Other values (20)
25 

Length

Max length15
Median length8
Mean length7.257075472
Min length2

Characters and Unicode

Total characters3077
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)4.2%

Sample

1st rowPK-Seutu
2nd rowTurku
3rd rowPK-Seutu
4th rowTampere
5th rowPK-Seutu
ValueCountFrequency (%)
PK-Seutu218
50.9%
Tampere99
23.1%
Turku43
 
10.0%
Oulu22
 
5.1%
Jyväskylä17
 
4.0%
Kuopio5
 
1.2%
Pori2
 
0.5%
Ruotsi1
 
0.2%
Wien1
 
0.2%
Viimsi1
 
0.2%
Other values (15)15
 
3.5%
(Missing)4
 
0.9%
2021-02-19T15:58:14.619855image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pk-seutu218
50.9%
tampere99
23.1%
turku43
 
10.0%
oulu22
 
5.1%
jyväskylä17
 
4.0%
kuopio5
 
1.2%
pori2
 
0.5%
lahti1
 
0.2%
tallinna1
 
0.2%
etänä1
 
0.2%
Other values (19)19
 
4.4%

Most occurring characters

ValueCountFrequency (%)
u578
18.8%
e425
13.8%
K226
 
7.3%
t223
 
7.2%
P221
 
7.2%
-220
 
7.1%
S220
 
7.1%
r148
 
4.8%
T143
 
4.6%
a116
 
3.8%
Other values (29)557
18.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1987
64.6%
Uppercase Letter865
28.1%
Dash Punctuation220
 
7.1%
Space Separator4
 
0.1%
Other Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
u578
29.1%
e425
21.4%
t223
 
11.2%
r148
 
7.4%
a116
 
5.8%
p105
 
5.3%
m103
 
5.2%
k61
 
3.1%
l47
 
2.4%
ä41
 
2.1%
Other values (10)140
 
7.0%
ValueCountFrequency (%)
K226
26.1%
P221
25.5%
S220
25.4%
T143
16.5%
O22
 
2.5%
J18
 
2.1%
E3
 
0.3%
L3
 
0.3%
V2
 
0.2%
W1
 
0.1%
Other values (6)6
 
0.7%
ValueCountFrequency (%)
-220
100.0%
ValueCountFrequency (%)
4
100.0%
ValueCountFrequency (%)
,1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2852
92.7%
Common225
 
7.3%

Most frequent character per script

ValueCountFrequency (%)
u578
20.3%
e425
14.9%
K226
 
7.9%
t223
 
7.8%
P221
 
7.7%
S220
 
7.7%
r148
 
5.2%
T143
 
5.0%
a116
 
4.1%
p105
 
3.7%
Other values (26)447
15.7%
ValueCountFrequency (%)
-220
97.8%
4
 
1.8%
,1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII3036
98.7%
None41
 
1.3%

Most frequent character per block

ValueCountFrequency (%)
u578
19.0%
e425
14.0%
K226
 
7.4%
t223
 
7.3%
P221
 
7.3%
-220
 
7.2%
S220
 
7.2%
r148
 
4.9%
T143
 
4.7%
a116
 
3.8%
Other values (28)516
17.0%
ValueCountFrequency (%)
ä41
100.0%

Ikä
Categorical

Distinct7
Distinct (%)1.6%
Missing2
Missing (%)0.5%
Memory size912.0 B
31-35 v
142 
26-30 v
104 
36-40 v
94 
41-45 v
47 
21-25 v
27 
Other values (2)
 
12

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters2982
Distinct characters10
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row31-35 v
2nd row31-35 v
3rd row26-30 v
4th row31-35 v
5th row26-30 v
ValueCountFrequency (%)
31-35 v142
33.2%
26-30 v104
24.3%
36-40 v94
22.0%
41-45 v47
 
11.0%
21-25 v27
 
6.3%
46-50 v7
 
1.6%
51-55 v5
 
1.2%
(Missing)2
 
0.5%
2021-02-19T15:58:14.976490image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T15:58:15.095152image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
v426
50.0%
31-35142
 
16.7%
26-30104
 
12.2%
36-4094
 
11.0%
41-4547
 
5.5%
21-2527
 
3.2%
46-507
 
0.8%
51-555
 
0.6%

Most occurring characters

ValueCountFrequency (%)
3482
16.2%
-426
14.3%
426
14.3%
v426
14.3%
5238
8.0%
1221
7.4%
6205
6.9%
0205
6.9%
4195
6.5%
2158
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1704
57.1%
Dash Punctuation426
 
14.3%
Space Separator426
 
14.3%
Lowercase Letter426
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
3482
28.3%
5238
14.0%
1221
13.0%
6205
12.0%
0205
12.0%
4195
11.4%
2158
 
9.3%
ValueCountFrequency (%)
-426
100.0%
ValueCountFrequency (%)
426
100.0%
ValueCountFrequency (%)
v426
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2556
85.7%
Latin426
 
14.3%

Most frequent character per script

ValueCountFrequency (%)
3482
18.9%
-426
16.7%
426
16.7%
5238
9.3%
1221
8.6%
6205
8.0%
0205
8.0%
4195
7.6%
2158
 
6.2%
ValueCountFrequency (%)
v426
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2982
100.0%

Most frequent character per block

ValueCountFrequency (%)
3482
16.2%
-426
14.3%
426
14.3%
v426
14.3%
5238
8.0%
1221
7.4%
6205
6.9%
0205
6.9%
4195
6.5%
2158
 
5.3%

Sukupuoli
Categorical

MISSING

Distinct3
Distinct (%)0.8%
Missing32
Missing (%)7.5%
Memory size688.0 B
mies
359 
nainen
 
29
muu
 
8

Length

Max length6
Median length4
Mean length4.126262626
Min length3

Characters and Unicode

Total characters1634
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmies
2nd rowmies
3rd rowmies
4th rowmies
5th rowmies
ValueCountFrequency (%)
mies359
83.9%
nainen29
 
6.8%
muu8
 
1.9%
(Missing)32
 
7.5%
2021-02-19T15:58:15.464049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T15:58:15.579845image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
mies359
90.7%
nainen29
 
7.3%
muu8
 
2.0%

Most occurring characters

ValueCountFrequency (%)
i388
23.7%
e388
23.7%
m367
22.5%
s359
22.0%
n87
 
5.3%
a29
 
1.8%
u16
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1634
100.0%

Most frequent character per category

ValueCountFrequency (%)
i388
23.7%
e388
23.7%
m367
22.5%
s359
22.0%
n87
 
5.3%
a29
 
1.8%
u16
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1634
100.0%

Most frequent character per script

ValueCountFrequency (%)
i388
23.7%
e388
23.7%
m367
22.5%
s359
22.0%
n87
 
5.3%
a29
 
1.8%
u16
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1634
100.0%

Most frequent character per block

ValueCountFrequency (%)
i388
23.7%
e388
23.7%
m367
22.5%
s359
22.0%
n87
 
5.3%
a29
 
1.8%
u16
 
1.0%

Työkokemus
Real number (ℝ≥0)

Distinct27
Distinct (%)6.4%
Missing4
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean9.636792453
Minimum0
Maximum30
Zeros3
Zeros (%)0.7%
Memory size3.5 KiB
2021-02-19T15:58:15.701922image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median9
Q313
95-th percentile21
Maximum30
Range30
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.058045866
Coefficient of variation (CV)0.6286371628
Kurtosis-0.03943807906
Mean9.636792453
Median Absolute Deviation (MAD)4
Skewness0.7090849042
Sum4086
Variance36.69991971
MonotocityNot monotonic
2021-02-19T15:58:15.863141image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
544
 
10.3%
1034
 
7.9%
429
 
6.8%
726
 
6.1%
1525
 
5.8%
2025
 
5.8%
324
 
5.6%
1323
 
5.4%
222
 
5.1%
1122
 
5.1%
Other values (17)150
35.0%
ValueCountFrequency (%)
03
 
0.7%
115
3.5%
222
5.1%
324
5.6%
429
6.8%
ValueCountFrequency (%)
302
 
0.5%
255
1.2%
242
 
0.5%
234
0.9%
224
0.9%
Distinct3
Distinct (%)0.7%
Missing1
Missing (%)0.2%
Memory size3.5 KiB
Työntekijä / palkollinen
382 
Freelancer
 
23
Yrittäjä
 
22

Length

Max length24
Median length24
Mean length22.42154567
Min length8

Characters and Unicode

Total characters9574
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTyöntekijä / palkollinen
2nd rowTyöntekijä / palkollinen
3rd rowTyöntekijä / palkollinen
4th rowYrittäjä
5th rowTyöntekijä / palkollinen
ValueCountFrequency (%)
Työntekijä / palkollinen382
89.3%
Freelancer23
 
5.4%
Yrittäjä22
 
5.1%
(Missing)1
 
0.2%
2021-02-19T15:58:16.236296image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T15:58:16.362758image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
382
32.1%
palkollinen382
32.1%
työntekijä382
32.1%
freelancer23
 
1.9%
yrittäjä22
 
1.8%

Most occurring characters

ValueCountFrequency (%)
n1169
12.2%
l1169
12.2%
e833
 
8.7%
i786
 
8.2%
k764
 
8.0%
764
 
8.0%
t426
 
4.4%
ä426
 
4.4%
a405
 
4.2%
j404
 
4.2%
Other values (10)2428
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8001
83.6%
Space Separator764
 
8.0%
Uppercase Letter427
 
4.5%
Other Punctuation382
 
4.0%

Most frequent character per category

ValueCountFrequency (%)
n1169
14.6%
l1169
14.6%
e833
10.4%
i786
9.8%
k764
9.5%
t426
 
5.3%
ä426
 
5.3%
a405
 
5.1%
j404
 
5.0%
y382
 
4.8%
Other values (5)1237
15.5%
ValueCountFrequency (%)
T382
89.5%
F23
 
5.4%
Y22
 
5.2%
ValueCountFrequency (%)
764
100.0%
ValueCountFrequency (%)
/382
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8428
88.0%
Common1146
 
12.0%

Most frequent character per script

ValueCountFrequency (%)
n1169
13.9%
l1169
13.9%
e833
9.9%
i786
9.3%
k764
9.1%
t426
 
5.1%
ä426
 
5.1%
a405
 
4.8%
j404
 
4.8%
T382
 
4.5%
Other values (8)1664
19.7%
ValueCountFrequency (%)
764
66.7%
/382
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII8766
91.6%
None808
 
8.4%

Most frequent character per block

ValueCountFrequency (%)
n1169
13.3%
l1169
13.3%
e833
9.5%
i786
9.0%
k764
8.7%
764
8.7%
t426
 
4.9%
a405
 
4.6%
j404
 
4.6%
T382
 
4.4%
Other values (8)1664
19.0%
ValueCountFrequency (%)
ä426
52.7%
ö382
47.3%

Työaika
Categorical

MISSING

Distinct5
Distinct (%)1.2%
Missing16
Missing (%)3.7%
Memory size3.5 KiB
1.0
388 
0.8
 
20
0.5
 
2
0.6
 
1
0.7
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1236
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.5%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
1.0388
90.7%
0.820
 
4.7%
0.52
 
0.5%
0.61
 
0.2%
0.71
 
0.2%
(Missing)16
 
3.7%
2021-02-19T15:58:16.817044image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T15:58:16.931547image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.0388
94.2%
0.820
 
4.9%
0.52
 
0.5%
0.61
 
0.2%
0.71
 
0.2%

Most occurring characters

ValueCountFrequency (%)
.412
33.3%
0412
33.3%
1388
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number824
66.7%
Other Punctuation412
33.3%

Most frequent character per category

ValueCountFrequency (%)
0412
50.0%
1388
47.1%
820
 
2.4%
52
 
0.2%
71
 
0.1%
61
 
0.1%
ValueCountFrequency (%)
.412
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1236
100.0%

Most frequent character per script

ValueCountFrequency (%)
.412
33.3%
0412
33.3%
1388
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1236
100.0%

Most frequent character per block

ValueCountFrequency (%)
.412
33.3%
0412
33.3%
1388
31.4%
820
 
1.6%
52
 
0.2%
71
 
0.1%
61
 
0.1%

Rooli
Categorical

HIGH CARDINALITY
MISSING

Distinct228
Distinct (%)54.5%
Missing10
Missing (%)2.3%
Memory size3.5 KiB
Ohjelmistokehittäjä
33 
full-stack
 
30
Full-stack
 
21
Arkkitehti
 
15
ohjelmistokehittäjä
 
15
Other values (223)
304 

Length

Max length67
Median length18
Mean length19.18660287
Min length2

Characters and Unicode

Total characters8020
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique184 ?
Unique (%)44.0%

Sample

1st rowArkkitehti
2nd rowfull-stack
3rd rowFull-stack ohjelmistokehittäjä
4th rowweb-arkkitehti
5th rowOhjelmistokehittäjä
ValueCountFrequency (%)
Ohjelmistokehittäjä33
 
7.7%
full-stack30
 
7.0%
Full-stack21
 
4.9%
Arkkitehti15
 
3.5%
ohjelmistokehittäjä15
 
3.5%
Full-stack ohjelmistokehittäjä8
 
1.9%
full-stack ohjelmistokehittäjä6
 
1.4%
arkkitehti6
 
1.4%
Full-stack kehittäjä5
 
1.2%
DevOps5
 
1.2%
Other values (218)274
64.0%
(Missing)10
 
2.3%
2021-02-19T15:58:17.368617image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
full-stack127
 
16.5%
ohjelmistokehittäjä100
 
13.0%
developer52
 
6.8%
arkkitehti34
 
4.4%
30
 
3.9%
lead28
 
3.6%
frontend24
 
3.1%
senior18
 
2.3%
kehittäjä15
 
2.0%
backend14
 
1.8%
Other values (165)327
42.5%

Most occurring characters

ValueCountFrequency (%)
t840
 
10.5%
e735
 
9.2%
i590
 
7.4%
l585
 
7.3%
k451
 
5.6%
o419
 
5.2%
s383
 
4.8%
a382
 
4.8%
356
 
4.4%
h322
 
4.0%
Other values (47)2957
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6972
86.9%
Uppercase Letter396
 
4.9%
Space Separator357
 
4.5%
Dash Punctuation153
 
1.9%
Other Punctuation86
 
1.1%
Open Punctuation24
 
0.3%
Close Punctuation24
 
0.3%
Math Symbol8
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
t840
12.0%
e735
 
10.5%
i590
 
8.5%
l585
 
8.4%
k451
 
6.5%
o419
 
6.0%
s383
 
5.5%
a382
 
5.5%
h322
 
4.6%
j301
 
4.3%
Other values (16)1964
28.2%
ValueCountFrequency (%)
F90
22.7%
O80
20.2%
S45
11.4%
D39
9.8%
A24
 
6.1%
T20
 
5.1%
L17
 
4.3%
C13
 
3.3%
E11
 
2.8%
P10
 
2.5%
Other values (11)47
11.9%
ValueCountFrequency (%)
,50
58.1%
/32
37.2%
&3
 
3.5%
.1
 
1.2%
ValueCountFrequency (%)
356
99.7%
 1
 
0.3%
ValueCountFrequency (%)
-153
100.0%
ValueCountFrequency (%)
(24
100.0%
ValueCountFrequency (%)
)24
100.0%
ValueCountFrequency (%)
+8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7368
91.9%
Common652
 
8.1%

Most frequent character per script

ValueCountFrequency (%)
t840
 
11.4%
e735
 
10.0%
i590
 
8.0%
l585
 
7.9%
k451
 
6.1%
o419
 
5.7%
s383
 
5.2%
a382
 
5.2%
h322
 
4.4%
j301
 
4.1%
Other values (37)2360
32.0%
ValueCountFrequency (%)
356
54.6%
-153
23.5%
,50
 
7.7%
/32
 
4.9%
(24
 
3.7%
)24
 
3.7%
+8
 
1.2%
&3
 
0.5%
.1
 
0.2%
 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7716
96.2%
None304
 
3.8%

Most frequent character per block

ValueCountFrequency (%)
t840
 
10.9%
e735
 
9.5%
i590
 
7.6%
l585
 
7.6%
k451
 
5.8%
o419
 
5.4%
s383
 
5.0%
a382
 
5.0%
356
 
4.6%
h322
 
4.2%
Other values (44)2653
34.4%
ValueCountFrequency (%)
ä288
94.7%
ö15
 
4.9%
 1
 
0.3%

Etä
Categorical

MISSING

Distinct2
Distinct (%)0.7%
Missing146
Missing (%)34.1%
Memory size680.0 B
Etä
180 
50/50
102 

Length

Max length5
Median length3
Mean length3.723404255
Min length3

Characters and Unicode

Total characters1050
Distinct characters6
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50/50
2nd rowEtä
3rd rowEtä
4th rowEtä
5th rowEtä
ValueCountFrequency (%)
Etä180
42.1%
50/50102
23.8%
(Missing)146
34.1%
2021-02-19T15:58:17.771601image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T15:58:17.898995image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
etä180
63.8%
50/50102
36.2%

Most occurring characters

ValueCountFrequency (%)
5204
19.4%
0204
19.4%
E180
17.1%
t180
17.1%
ä180
17.1%
/102
9.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number408
38.9%
Lowercase Letter360
34.3%
Uppercase Letter180
17.1%
Other Punctuation102
 
9.7%

Most frequent character per category

ValueCountFrequency (%)
5204
50.0%
0204
50.0%
ValueCountFrequency (%)
t180
50.0%
ä180
50.0%
ValueCountFrequency (%)
/102
100.0%
ValueCountFrequency (%)
E180
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin540
51.4%
Common510
48.6%

Most frequent character per script

ValueCountFrequency (%)
5204
40.0%
0204
40.0%
/102
20.0%
ValueCountFrequency (%)
E180
33.3%
t180
33.3%
ä180
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII870
82.9%
None180
 
17.1%

Most frequent character per block

ValueCountFrequency (%)
5204
23.4%
0204
23.4%
E180
20.7%
t180
20.7%
/102
11.7%
ValueCountFrequency (%)
ä180
100.0%

Kuukausipalkka
Real number (ℝ≥0)

MISSING

Distinct117
Distinct (%)30.0%
Missing38
Missing (%)8.9%
Infinite0
Infinite (%)0.0%
Mean4676.174359
Minimum1666
Maximum15000
Zeros0
Zeros (%)0.0%
Memory size3.5 KiB
2021-02-19T15:58:18.023253image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1666
5-th percentile2809
Q13862.5
median4500
Q35500
95-th percentile7000
Maximum15000
Range13334
Interquartile range (IQR)1637.5

Descriptive statistics

Standard deviation1320.016661
Coefficient of variation (CV)0.2822855949
Kurtosis8.947538524
Mean4676.174359
Median Absolute Deviation (MAD)753
Skewness1.480402915
Sum1823708
Variance1742443.985
MonotocityNot monotonic
2021-02-19T15:58:18.233418image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
400023
 
5.4%
450020
 
4.7%
600016
 
3.7%
550015
 
3.5%
500015
 
3.5%
480011
 
2.6%
700011
 
2.6%
430011
 
2.6%
420010
 
2.3%
41009
 
2.1%
Other values (107)249
58.2%
(Missing)38
 
8.9%
ValueCountFrequency (%)
16661
0.2%
17001
0.2%
18001
0.2%
21001
0.2%
22751
0.2%
ValueCountFrequency (%)
150001
 
0.2%
85001
 
0.2%
80005
1.2%
75002
 
0.5%
72001
 
0.2%

Vuositulot
Real number (ℝ≥0)

MISSING

Distinct168
Distinct (%)40.3%
Missing11
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean65316.10671
Minimum0
Maximum250000
Zeros2
Zeros (%)0.5%
Memory size3.5 KiB
2021-02-19T15:58:18.445388image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile35000
Q150000
median60000
Q375000
95-th percentile120000
Maximum250000
Range250000
Interquartile range (IQR)25000

Descriptive statistics

Standard deviation28626.85746
Coefficient of variation (CV)0.4382817485
Kurtosis8.41775568
Mean65316.10671
Median Absolute Deviation (MAD)12000
Skewness2.170441848
Sum27236816.5
Variance819496967.9
MonotocityNot monotonic
2021-02-19T15:58:18.656574image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5500017
 
4.0%
7500015
 
3.5%
5000014
 
3.3%
6000014
 
3.3%
6500010
 
2.3%
850009
 
2.1%
625009
 
2.1%
800009
 
2.1%
700008
 
1.9%
540008
 
1.9%
Other values (158)304
71.0%
(Missing)11
 
2.6%
ValueCountFrequency (%)
02
0.5%
40001
0.2%
61001
0.2%
75001
0.2%
200001
0.2%
ValueCountFrequency (%)
2500001
 
0.2%
2000003
0.7%
1900001
 
0.2%
1800001
 
0.2%
1550001
 
0.2%

Kilpailukykyinen
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.5%
Missing14
Missing (%)3.3%
Memory size3.5 KiB
True
291 
False
123 
(Missing)
 
14
ValueCountFrequency (%)
True291
68.0%
False123
28.7%
(Missing)14
 
3.3%
2021-02-19T15:58:18.791386image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Työpaikka
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct67
Distinct (%)68.4%
Missing330
Missing (%)77.1%
Memory size3.5 KiB
Gofore
11 
Vincit
 
6
Fraktio
 
4
Futurice
 
4
Arado
 
3
Other values (62)
70 

Length

Max length132
Median length7
Mean length10.59183673
Min length2

Characters and Unicode

Total characters1038
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique56 ?
Unique (%)57.1%

Sample

1st rowQuestrade
2nd rowDigia Oyj
3rd rowGofore
4th rowOura Health
5th rowWirepas
ValueCountFrequency (%)
Gofore11
 
2.6%
Vincit6
 
1.4%
Fraktio4
 
0.9%
Futurice4
 
0.9%
Arado3
 
0.7%
Mavericks3
 
0.7%
Pankki3
 
0.7%
KVTES-alainen kunnan omistama oy 2
 
0.5%
Qvik2
 
0.5%
Gofore Oyj2
 
0.5%
Other values (57)58
 
13.6%
(Missing)330
77.1%
2021-02-19T15:58:19.268751image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gofore13
 
8.4%
oy9
 
5.8%
vincit6
 
3.9%
fraktio4
 
2.6%
mavericks4
 
2.6%
oyj4
 
2.6%
futurice4
 
2.6%
pankki3
 
1.9%
siili3
 
1.9%
arado3
 
1.9%
Other values (89)102
65.8%

Most occurring characters

ValueCountFrequency (%)
i104
 
10.0%
a82
 
7.9%
o79
 
7.6%
e73
 
7.0%
t71
 
6.8%
60
 
5.8%
r57
 
5.5%
n50
 
4.8%
l41
 
3.9%
u40
 
3.9%
Other values (43)381
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter842
81.1%
Uppercase Letter130
 
12.5%
Space Separator60
 
5.8%
Other Punctuation3
 
0.3%
Dash Punctuation3
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
G14
 
10.8%
O13
 
10.0%
V12
 
9.2%
S11
 
8.5%
F9
 
6.9%
A7
 
5.4%
K7
 
5.4%
C6
 
4.6%
P6
 
4.6%
E5
 
3.8%
Other values (15)40
30.8%
ValueCountFrequency (%)
i104
12.4%
a82
 
9.7%
o79
 
9.4%
e73
 
8.7%
t71
 
8.4%
r57
 
6.8%
n50
 
5.9%
l41
 
4.9%
u40
 
4.8%
k40
 
4.8%
Other values (15)205
24.3%
ValueCountFrequency (%)
60
100.0%
ValueCountFrequency (%)
.3
100.0%
ValueCountFrequency (%)
-3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin972
93.6%
Common66
 
6.4%

Most frequent character per script

ValueCountFrequency (%)
i104
 
10.7%
a82
 
8.4%
o79
 
8.1%
e73
 
7.5%
t71
 
7.3%
r57
 
5.9%
n50
 
5.1%
l41
 
4.2%
u40
 
4.1%
k40
 
4.1%
Other values (40)335
34.5%
ValueCountFrequency (%)
60
90.9%
.3
 
4.5%
-3
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1026
98.8%
None12
 
1.2%

Most frequent character per block

ValueCountFrequency (%)
i104
 
10.1%
a82
 
8.0%
o79
 
7.7%
e73
 
7.1%
t71
 
6.9%
60
 
5.8%
r57
 
5.6%
n50
 
4.9%
l41
 
4.0%
u40
 
3.9%
Other values (41)369
36.0%
ValueCountFrequency (%)
ä11
91.7%
ö1
 
8.3%

Vapaa sana
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct32
Distinct (%)97.0%
Missing395
Missing (%)92.3%
Memory size3.5 KiB
palkan lisänä lounas- ja virkistysetu
 
2
Työskentelen opintojen ohella, ensimmäisessä frontend devaajan työssä. Olen opiskellut reilu 2 vuotta yliopistossa. Palkkani on mielestäni nyt ihan ok, mutta tarkoituksena nostaa sitä 3000e /kk loppukesään mennessä.
 
1
Vuositulot pitää sisällään myös sivutoimisena tehtyä pientä laskutusta.
 
1
Ihan OK. Edut myös kovat.
 
1
saispa lisää liksaa
 
1
Other values (27)
27 

Length

Max length286
Median length71
Mean length95.54545455
Min length7

Characters and Unicode

Total characters3153
Distinct characters55
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)93.9%

Sample

1st rowKuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.
2nd rowTyöskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
3rd rowpalkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen.
4th rowRahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
5th rowOsittain laskutukseen perustuva palkka joten vaihtelee.
ValueCountFrequency (%)
palkan lisänä lounas- ja virkistysetu2
 
0.5%
Työskentelen opintojen ohella, ensimmäisessä frontend devaajan työssä. Olen opiskellut reilu 2 vuotta yliopistossa. Palkkani on mielestäni nyt ihan ok, mutta tarkoituksena nostaa sitä 3000e /kk loppukesään mennessä. 1
 
0.2%
Vuositulot pitää sisällään myös sivutoimisena tehtyä pientä laskutusta.1
 
0.2%
Ihan OK. Edut myös kovat.1
 
0.2%
saispa lisää liksaa1
 
0.2%
Sijainti Pori, mutta etätöitä 100%. Varsinainen positio Tampere - Helsinki. Edut aika huonot, perusjutut. Työ itsessään aika masentavaa. Seuraavaksi varmaan freelance/yrittäjyys.1
 
0.2%
Startup1
 
0.2%
palkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen. 1
 
0.2%
Vaikka merkitsin, että palkkani ei ole mielestäni kilpailukykyinen, se ei tarkoita ettenkö olisi siihen tyytyväinen. Tilanne yrittäjillä ei yleensä vastaa samaa kuin palkansaajilla, joten palkka ei ole yrittäjille monestikaan niin mustavalkoinen asia vaan kysymys on isommasta kuviosta.1
 
0.2%
Vaikea vastata henkilönä joka tekee yrityksen kautta yhdelle ulkomaalaiselle yritykselle töitä (jolla ei ole entiteettiä suomessa). Vastasin nyt ikään kuin olisin yrittäjä vaikka käytännössä tämä on sama kuin olisin palkkaduunissa.1
 
0.2%
Other values (22)22
 
5.1%
(Missing)395
92.3%
2021-02-19T15:58:19.685469image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ei10
 
2.5%
palkka9
 
2.3%
on8
 
2.0%
mutta6
 
1.5%
ole6
 
1.5%
ja6
 
1.5%
nyt4
 
1.0%
firman4
 
1.0%
joten4
 
1.0%
palkan4
 
1.0%
Other values (281)334
84.6%

Most occurring characters

ValueCountFrequency (%)
365
11.6%
a331
10.5%
i271
 
8.6%
t247
 
7.8%
n216
 
6.9%
s205
 
6.5%
e203
 
6.4%
k185
 
5.9%
l161
 
5.1%
o146
 
4.6%
Other values (45)823
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2625
83.3%
Space Separator365
 
11.6%
Other Punctuation74
 
2.3%
Uppercase Letter47
 
1.5%
Decimal Number27
 
0.9%
Dash Punctuation6
 
0.2%
Open Punctuation3
 
0.1%
Close Punctuation3
 
0.1%
Math Symbol3
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
a331
12.6%
i271
10.3%
t247
9.4%
n216
 
8.2%
s205
 
7.8%
e203
 
7.7%
k185
 
7.0%
l161
 
6.1%
o146
 
5.6%
u118
 
4.5%
Other values (14)542
20.6%
ValueCountFrequency (%)
T7
14.9%
P7
14.9%
O6
12.8%
E6
12.8%
V6
12.8%
S4
8.5%
K3
6.4%
I2
 
4.3%
H2
 
4.3%
R1
 
2.1%
Other values (3)3
6.4%
ValueCountFrequency (%)
015
55.6%
13
 
11.1%
52
 
7.4%
22
 
7.4%
82
 
7.4%
62
 
7.4%
31
 
3.7%
ValueCountFrequency (%)
.38
51.4%
,23
31.1%
/5
 
6.8%
%4
 
5.4%
"2
 
2.7%
?2
 
2.7%
ValueCountFrequency (%)
365
100.0%
ValueCountFrequency (%)
(3
100.0%
ValueCountFrequency (%)
)3
100.0%
ValueCountFrequency (%)
+3
100.0%
ValueCountFrequency (%)
-6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2672
84.7%
Common481
 
15.3%

Most frequent character per script

ValueCountFrequency (%)
a331
12.4%
i271
10.1%
t247
9.2%
n216
 
8.1%
s205
 
7.7%
e203
 
7.6%
k185
 
6.9%
l161
 
6.0%
o146
 
5.5%
u118
 
4.4%
Other values (27)589
22.0%
ValueCountFrequency (%)
365
75.9%
.38
 
7.9%
,23
 
4.8%
015
 
3.1%
-6
 
1.2%
/5
 
1.0%
%4
 
0.8%
(3
 
0.6%
)3
 
0.6%
+3
 
0.6%
Other values (8)16
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3017
95.7%
None136
 
4.3%

Most frequent character per block

ValueCountFrequency (%)
365
12.1%
a331
11.0%
i271
 
9.0%
t247
 
8.2%
n216
 
7.2%
s205
 
6.8%
e203
 
6.7%
k185
 
6.1%
l161
 
5.3%
o146
 
4.8%
Other values (43)687
22.8%
ValueCountFrequency (%)
ä112
82.4%
ö24
 
17.6%

Interactions

2021-02-19T15:58:11.365022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T15:58:11.546251image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T15:58:11.712171image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T15:58:11.881348image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T15:58:12.049001image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-19T15:58:12.208979image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-02-19T15:58:19.845281image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-19T15:58:20.052871image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-19T15:58:20.252654image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-19T15:58:20.462100image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-19T15:58:12.507490image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-19T15:58:12.921310image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-19T15:58:13.398515image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-19T15:58:13.730135image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
02021-02-15 11:57:08.316PK-Seutu31-35 vNaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN
12021-02-15 11:57:19.676Turku31-35 vmies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN
22021-02-15 11:58:03.592PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN
32021-02-15 11:58:15.261Tampere31-35 vmies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN
42021-02-15 11:58:16.983PK-Seutu26-30 vmies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN
52021-02-15 11:58:49.454PK-Seutu41-45 vmies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN8000.0100000.0TrueNaNNaN
62021-02-15 12:00:03.771PK-Seutu31-35 vmies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN
72021-02-15 12:00:04.655Tampere31-35 vNaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäNaN4250.054000.0TrueNaNNaN
82021-02-15 12:01:00.769Tampere31-35 vmies6.0Työntekijä / palkollinen1.0Lead developerNaN4000.050000.0FalseNaNNaN
92021-02-15 12:02:03.577Tallinna31-35 vmies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sana
4182021-02-19 16:26:32.700PK-Seutu36-40 vmies16.0Työntekijä / palkollinen1.0Mobile SWEtä8000.095000.0TrueMavericksNaN
4192021-02-19 16:33:27.762PK-Seutu31-35 vmies11.0Työntekijä / palkollinen1.0Full stack50/507000.087500.0TrueMavericksNaN
4202021-02-19 16:34:07.545PK-Seutu31-35 vmies12.0Työntekijä / palkollinen1.0full-stackEtä8000.095000.0TrueMavericksNaN
4212021-02-19 16:36:55.938Tampere41-45 vmies22.0Työntekijä / palkollinen0.8ohjelmistokehittäjä (backend) / arkkitehtiEtä4700.058750.0FalseNaNNaN
4222021-02-19 16:38:41.403PK-Seutu36-40 vmies2.0Työntekijä / palkollinen1.0WordPress-kehittäjä50/503000.037500.0FalseNaNNaN
4232021-02-19 16:39:14.831Tampere31-35 vmies5.0Työntekijä / palkollinen1.0Data scientistEtä4300.053750.0NaNWapiceNaN
4242021-02-19 16:48:04.696PK-Seutu41-45 vmies15.0Työntekijä / palkollinen1.0ohjelmistokehittäjä50/50NaN100000.0TrueNaNNaN
4252021-02-19 16:54:30.691Turku36-40 vmies13.0Työntekijä / palkollinen1.0Lead Software EngineerNaN5500.075000.0TrueNaNNaN
4262021-02-19 17:13:18.923PK-Seutu36-40 vmies15.0Työntekijä / palkollinen1.0full-stackEtä6000.076000.0TrueNaNNaN
4272021-02-19 17:51:37.178PK-Seutu21-25 vnainen4.0Työntekijä / palkollinen1.0Frontend ohjelmistokehittäjäEtä4000.055000.0TrueNaNNaN