2. Quick start with the STELAR client

The STELAR client is a Python library that allows you either to interact with the STELAR KLMS either via a command line, such as a Jupyter notebook or an IPython shell, or, it can be used as a library in your Python code, for example in a GUI built using streamlit <https://streamlit.io/>.

In this quick start guide, we will show you how to install the client, how to configure it, and how to use it to interact with the STELAR KLMS.

2.1. Installation

The client is available on PyPI, so you can install it using pip:

pip install stelar_client

2.2. Configuration

Before you can use the client, you need to configure it. The client uses a configuration file to store the connection details to the STELAR KLMS. The configuration file is a config file that declares a number of contexts. Each context is a set of connection details to a specific instance of the STELAR KLMS.

The default location of the configuration file is in the user’s home directory, and by default it is called .stelar. So, if you are on a Unix-like system, the configuration file will be ~/.stelar.

Here is an example of a configuration file:

[default]
url = https://klms.stelar.gr
username = myuser
password = mypassword

[dev]
url = https://dev.stelar.example.com
username = mydevuser
password = mydevpassword

This file declares two contexts: default and dev. The default context is the default context that the client will use if no context is specified. The dev context is an example of a context that you can use to connect to a development instance of the STELAR KLMS.

2.3. Creating a client

Once you have installed the client and created a configuration file, you can start using the client. The first step is to import the client and create a client object. Here is an example of how to do this:

from stelar.client import Client

client = Client()

This will create a client object that is connected to the default context in the configuration file. If you want to connect to a different context, you can specify the context when creating the client object:

client = Client(context='dev')
# same as  client = Client('dev')

2.3.1. Accessing the KLMS

Once you have created the client object, you can start using it to interact with the STELAR KLMS. Let us start by looking at a list of available datasets:

client.datasets[:]
Out[3]: Dataset['em_test_dataset', 'nyse_stock_dataset', 'shakespeare_novels', 'stock_movements_nyse', 'synopses_experiment', 'synopses_experiment_2', 'word_count_results']

This will return a list of datasets that are available in the STELAR KLMS. Assume that we are interested in the NYSE stock dataset. We can get the dataset as follows:

nyse_stock_dataset = client.datasets['nyse_stock_dataset']
nyse_stock_dataset
Out[5]: <Dataset nyse_stock_dataset CLEAN>

This will return a proxy object that represents the NYSE stock dataset. Via this proxy object, we can examine the dataset, download it, or upload new data to it. For example, we can get a list of the columns in the dataset:

nyse_stock_dataset.sl
Out[6]:
author                                                           admin
author_email                                            info@stelar.gr
creator                           f04457e8-2cad-4893-ae30-4ac2f432df0e
extras                                                              {}
groups                                                              ()
id                                7c67f766-e839-441a-98f2-3b3e5fcf62a5
maintainer                                                        vsam
maintainer_email                                                  None
metadata_created                            2025-02-12 09:16:25.598958
metadata_modified                           2025-03-13 11:56:23.441754
name                                                nyse_stock_dataset
notes                A collection of 1 year long historical data of...
organization                                               stelar-klms
private                                                          False
resources            (Resource ID: f3579502-2113-4821-9bf4-9d540c12...
spatial                                                           None
state                                                           active
tags                                   (AAPL, NVDA, NYSE, SDE, Stocks)
title                                               NYSE Stock Dataset
type                                                           dataset
url                                                          stelar.de
version                                                          0.0.3
Name: Dataset (CLEAN), dtype: object

This will return a list of metadata fields for the dataset. The metadata is displayed as a pandas Series object, which is useful for interactive exploration of the dataset as a whole. In a programmatic context, you can access the metadata fields as attributes of the proxy object:

nyse_stock_dataset.title
Out[7]: 'NYSE Stock Dataset'

You can also update the metadata fields of the dataset:

nyse_stock_dataset.title = 'NYSE Stock Dataset 2025'
nyse_stock_dataset.title
Out[9]: 'NYSE Stock Dataset 2025'

You can examine resources that this dataset may contain:

nyse_stock_dataset.resources
Out[10]: Resource[UUID('f3579502-2113-4821-9bf4-9d540c129b31'), UUID('e5bf830e-3b21-4fae-9991-d90486b5d06e')]

The value returned is proxy list, that is, a list-like object that can be used to access proxy objects. For example, to get the proxy object for the first resource in the list, you can do:

nyse_stock_dataset.resources[0]
Out[11]: <Resource f3579502-2113-4821-9bf4-9d540c129b31 CLEAN>

We can examine this proxy object like we did with the dataset proxy object:

nyse_stock_dataset.resources[0].sl
Out[12]:
_extras                {'datastore_active': False, 'relation': 'owned'}
cache_last_updated                                                 None
cache_url                                                          None
created                                      2025-02-12 09:17:10.411495
dataset                                              nyse_stock_dataset
description
format                                                              CSV
hash
id                                 f3579502-2113-4821-9bf4-9d540c129b31
last_modified                                                      None
metadata_modified                            2025-02-12 09:17:40.649473
mimetype                                                       text/csv
mimetype_inner                                                     None
name                                              AAPL 1y Stock History
position                                                              0
resource_type                                                      None
size                                                               None
state                                                            active
url                   s3://klms-bucket/raw-data/stocks/aapl_intraday...
url_type                                                           None

This is quite a lot of information, but a resource is mainly about files in the KLMS. This particular resource is a CSV file that contains the 1-year historical stock data for the company AAPL. The url field contains the location of the file in the KLMS.

2.3.2. Working with data

The last line of the previous example shows the location of the file in the KLMS. Also, we note that the format attribute indicates that the file is in CSV format. We can download this file and load it into a pandas DataFrame as follows:

In [1]: df = r.read_dataframe()
In [2]: df.head()
Out[7]:
                        Date Ticker        Open  ...    Volume  Dividends  Stock Splits
0  2024-02-12 00:00:00-05:00   AAPL  187.534482  ...  41781900        0.0           0.0
1  2024-02-13 00:00:00-05:00   AAPL  184.896960  ...  56529500        0.0           0.0
2  2024-02-14 00:00:00-05:00   AAPL  184.449076  ...  54630500        0.0           0.0
3  2024-02-15 00:00:00-05:00   AAPL  182.687400  ...  65434500        0.0           0.0
4  2024-02-16 00:00:00-05:00   AAPL  182.557985  ...  49701400        0.0           0.0

[5 rows x 9 columns]

Let us compute some simple statistics on this trivial dataset and then publsh the result. First, let us compude a dataset with some averages:

In [1]: outdf = df.groupby(pd.to_datetime(df.Date,utc=True).dt.year).Open.agg(['mean','min','max'])
Out[1]: outdf
            mean         min         max
Date
2024  208.790457  164.572913  257.906429
2025  233.446786  219.548596  248.656607

We can now publish this limited dataframe under the same dataset.

In [2]: newrsrc = nyse_stock_dataset.add_dataframe(outdf, "s3://klms-bucket/stock_averages.parquet")
In [3]: newrsrc.sxl
Out[3]:
cache_last_updated                                                 None
cache_url                                                          None
columns                                          ['mean', 'min', 'max']
created                                      2025-03-17 23:34:09.448511
dataset                                              nyse_stock_dataset
datastore_active                                                  False
description           {"mean":{"count":2.0,"mean":221.1186218583,"st...
format                                                          parquet
hash
id                                 40182a61-4e49-415f-980c-946fa7831380
last_modified                                                      None
metadata_modified                            2025-03-17 23:34:09.417234
mimetype                                            application/parquet
mimetype_inner                                                     None
name                                                     stock_averages
position                                                              3
relation                                                          owned
resource_type                                                      None
rows                                                                  2
size                                                                 56
state                                                            active
url                             s3://klms-bucket/stock_averages.parquet
url_type                                                           None
Name: Resource (CLEAN), dtype: object

The simple dataframe was added as a new resource under the same dataset. The resource is a parquet file stored in the KLMS. The add_dataframe method is a convenience method that allows you to add a pandas DataFrame to the KLMS as a new resource. The method takes the DataFrame, and an S3 URL where the file will be stored in the KLMS.

Let us clean up our new resource and dataset:

In [4]: newrsrc.delete()
In [5]: client.s3fs().rm("s3://klms-bucket/stock_averages.parquet")

There are many more capabilities and features to explore in the STELAR client. You can refer to the rest of the documentation for more information on how to use the client, both interactively and programmatically.