Introduction

susR provides an R interface to the Slovak Statistical Office (SUSR) open data API, making it easier to:

  • List available tables (and their dimension codes).
  • Retrieve metadata for those tables (e.g., domain, subdomain).
  • Fetch the data itself in JSON-stat format.
  • Convert the data into user-friendly tibbles ready for analysis.

This vignette guides you through a basic workflow: from installing susR to fetching data.

Installation If you haven’t already installed susR, you can do so from GitHub:

devtools::install_github("Arnold-Kakas/susR")

Then load it:

Listing Available Tables

There are two options how to retrieve available tables.

First is calling susr_domains() functions which reads saved list of available tables, their names, domain and subdomains:

susr_domains_table <- susr_domains()
head(susr_domains_table)
#> # A tibble: 6 × 4
#>   table_code table_name                                         domain subdomain
#>   <chr>      <chr>                                              <chr>  <chr>    
#> 1 as1001rs   Population and attributes of age                   Indic… Indicato…
#> 2 as1002rs   Participation of seniors in the labour market      Indic… Indicato…
#> 3 as1003rs   Net money income and expenditure of households by… Indic… Indicato…
#> 4 as1004rs   Expenditure of households on selected goods and s… Indic… Indicato…
#> 5 as1005rs   At-risk-of-poverty rate by selected age group and… Indic… Indicato…
#> 6 as1006rs   Social and pension security                        Indic… Indicato…

Second is to call the SUSR API:

susr_tables <- susr_tables()
head(susr_tables)
#> # A tibble: 6 × 6
#>   class   href                           table_code label update dimension_names
#>   <chr>   <chr>                          <chr>      <chr> <chr>  <chr>          
#> 1 dataset https://data.statistics.sk/ap… as1001rs   Popu… 2024-… as1001rs_rok:a…
#> 2 dataset https://data.statistics.sk/ap… as1002rs   Part… 2024-… as1002rs_rok:a…
#> 3 dataset https://data.statistics.sk/ap… as1003rs   Net … 2024-… as1003rs_rok:a…
#> 4 dataset https://data.statistics.sk/ap… as1004rs   Expe… 2024-… as1004rs_rok:a…
#> 5 dataset https://data.statistics.sk/ap… as1005rs   At-r… 2024-… as1005rs_rok:a…
#> 6 dataset https://data.statistics.sk/ap… as1006rs   Soci… 2024-… as1006rs_rok:a…

This returns a tibble with columns like table_code, href, dimension_names, label, and update.

If you only want certain domains or subdomains, you can do it by passing one (or more) domains and/or subdomains from susr_domains() result:

# Example: filter by domain = "Demographic and social statistics"
domain_tables <- susr_tables(domain = "Demographic and social statistics")

dim(domain_tables)
#> [1] 231   9
subdomain_tables <- susr_tables(domain = "Housing")

dim(subdomain_tables)
#> [1] 2 9
multiple_subdomain_tables <- susr_tables(domain = c("Housing", "Energy"))

dim(multiple_subdomain_tables)
#> [1] 12  9

You can pass parameter long = TRUE in order to fetch data in long format where each row contain only one dimension.

subdomain_tables_long <- susr_tables(domain = "Housing",
                                     long = TRUE)

dim(subdomain_tables_long)
#> [1] 10  8

Discovering Dimensions

You can see what valid dimension values by using susr_dimension_values().

dims <- susr_dimension_values(table_code = "bv3001rr",
                              dimension_code = "bv3001rr_voda")

dims
#> # A tibble: 4 × 6
#>   dimension_code dimension_label dimension_note element_index element_value
#>   <chr>          <chr>           <chr>                  <int> <chr>        
#> 1 bv3001rr_voda  bv3001rr_voda   Water supply               0 VODATOT      
#> 2 bv3001rr_voda  bv3001rr_voda   Water supply               1 VODA01       
#> 3 bv3001rr_voda  bv3001rr_voda   Water supply               2 VODA02       
#> 4 bv3001rr_voda  bv3001rr_voda   Water supply               3 VODA03       
#> # ℹ 1 more variable: element_label <chr>

You can map this function to susr_tables() output in long format.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(tidyr)

# 1) Get all tables in the "Energy" domain, in long format
energy_long <- susr_tables(domain = "Energy", long = TRUE)

# Inspect the columns:
# - table_code
# - dimension_code
# - plus other metadata (domain, subdomain, label, href, etc.)

# 2) Map susr_dimension_values() across each row:
#    We'll create a new list-column 'dim_info' that stores the dimension values
energy_long_dim <- energy_long |> 
  mutate(
    dim_info = map2(table_code, dimension_code, ~ susr_dimension_values(.x, .y))
  )

# Now `energy_long_dim$dim_info` is a list of tibbles/data frames (one for each dimension).
# Each entry holds columns like:
#   dimension_code, dimension_label, dimension_note, element_code, element_label, etc.

# 3) (Optional) Unnest into a single combined data frame
#    If you'd like a tall data frame where each row is a unique dimension element:
full_dim_info <- energy_long_dim |> 
  unnest(cols = dim_info, names_sep = "dim_")

# `full_dim_info` now contains columns:
#   class, href, table_code, label, update, dimension_code, 
#   (optionally) domain/subdomain, 
#   and the un-nested dimension columns from susr_dimension_values().

head(full_dim_info)
#> # A tibble: 6 × 14
#>   class   href           table_code label update dimension_code domain subdomain
#>   <chr>   <chr>          <chr>      <chr> <chr>  <chr>          <chr>  <chr>    
#> 1 dataset https://data.… en1001ms   Sour… 2024-… en1001ms_rok   Multi… Energy   
#> 2 dataset https://data.… en1001ms   Sour… 2024-… en1001ms_rok   Multi… Energy   
#> 3 dataset https://data.… en1001ms   Sour… 2024-… en1001ms_rok   Multi… Energy   
#> 4 dataset https://data.… en1001ms   Sour… 2024-… en1001ms_rok   Multi… Energy   
#> 5 dataset https://data.… en1001ms   Sour… 2024-… en1001ms_rok   Multi… Energy   
#> 6 dataset https://data.… en1001ms   Sour… 2024-… en1001ms_rok   Multi… Energy   
#> # ℹ 6 more variables: dim_infodim_dimension_code <chr>,
#> #   dim_infodim_dimension_label <chr>, dim_infodim_dimension_note <chr>,
#> #   dim_infodim_element_index <int>, dim_infodim_element_value <chr>,
#> #   dim_infodim_element_label <chr>

Fetching Data

Once you know the table code and dimension segments, use fetch_susr_data(). If you want to fetch data from multiple tables multiple pairs of table codes and dimension lists:

params <- list(
  "np3106rr",
  list("SK021", c("2016","2017","2018"), "E_PRIEM_HR_MZDA", "all"),
  "as1001rs",
  list("all", "all", "all")
)

multi_res <- fetch_susr_data(params)

names(multi_res)
#> [1] "np3106rr" "as1001rs"

class(multi_res$np3106rr)
#> [1] "tbl_df"     "tbl"        "data.frame"

Each element in multi_res is a tibble for given table code.

Example Analysis

Once you have the data, use dplyr and ggplot2 (or anything else) to analyze:

library(ggplot2)

p <- multi_res$as1001rs |>
  filter(as1001rs_ukaz == "Pre-productive population (aged 0 -14 years)",
         as1001rs_poh == "Total") |>
  ggplot() +
  geom_line(aes(x = as1001rs_rok, y = value, group = 1)) +
  labs(title = "Pre-productive population (aged 0 -14 years)",
       x = "Year",
       y = NULL) +
  theme_minimal()


p