.. _quickstart-guide:
===================================
Quick start with the STELAR client
===================================
The STELAR client is a Python library that allows you either
to interact with the STELAR KLMS either via a command line,
such as a Jupyter notebook or an IPython shell, or,
it can be used as a library in your Python code, for example
in a GUI built using streamlit .
In this quick start guide, we will show you how to install the client,
how to configure it, and how to use it to interact with the STELAR KLMS.
Installation
============
The client is available on PyPI, so you can install it using pip:
.. code-block:: bash
pip install stelar_client
Configuration
=============
Before you can use the client, you need to configure it.
The client uses a configuration file to store the connection details
to the STELAR KLMS. The configuration file is a config file that
declares a number of :dfn:`contexts`. Each context is a set of
connection details to a specific instance of the STELAR KLMS.
The default location of the configuration file is in the user's home
directory, and by default it is called ``.stelar``. So, if you are on a Unix-like
system, the configuration file will be ``~/.stelar``.
Here is an example of a configuration file:
.. code-block:: ini
[default]
url = https://klms.stelar.gr
username = myuser
password = mypassword
[dev]
url = https://dev.stelar.example.com
username = mydevuser
password = mydevpassword
This file declares two contexts: `default` and `dev`. The `default` context
is the default context that the client will use if no context is specified.
The `dev` context is an example of a context that you can use to connect to
a development instance of the STELAR KLMS.
Creating a client
=================
Once you have installed the client and created a configuration file,
you can start using the client. The first step is to import the client
and create a client object. Here is an example of how to do this:
.. code-block:: python
from stelar.client import Client
client = Client()
This will create a client object that is connected to the `default` context
in the configuration file. If you want to connect to a different context,
you can specify the context when creating the client object:
.. code-block:: python
client = Client(context='dev')
# same as client = Client('dev')
Accessing the KLMS
------------------
Once you have created the client object, you can start using it to interact
with the STELAR KLMS. Let us start by looking at a list of available datasets:
.. code-block:: pycon
client.datasets[:]
Out[3]: Dataset['em_test_dataset', 'nyse_stock_dataset', 'shakespeare_novels', 'stock_movements_nyse', 'synopses_experiment', 'synopses_experiment_2', 'word_count_results']
This will return a list of datasets that are available in the STELAR KLMS.
Assume that we are interested in the NYSE stock dataset. We can get the
dataset as follows:
.. code-block:: pycon
nyse_stock_dataset = client.datasets['nyse_stock_dataset']
nyse_stock_dataset
Out[5]:
This will return a *proxy object* that represents the NYSE stock dataset.
Via this proxy object, we can examine the dataset, download it, or upload
new data to it. For example, we can get a list of the columns in the dataset:
.. code-block:: pycon
nyse_stock_dataset.sl
Out[6]:
author admin
author_email info@stelar.gr
creator f04457e8-2cad-4893-ae30-4ac2f432df0e
extras {}
groups ()
id 7c67f766-e839-441a-98f2-3b3e5fcf62a5
maintainer vsam
maintainer_email None
metadata_created 2025-02-12 09:16:25.598958
metadata_modified 2025-03-13 11:56:23.441754
name nyse_stock_dataset
notes A collection of 1 year long historical data of...
organization stelar-klms
private False
resources (Resource ID: f3579502-2113-4821-9bf4-9d540c12...
spatial None
state active
tags (AAPL, NVDA, NYSE, SDE, Stocks)
title NYSE Stock Dataset
type dataset
url stelar.de
version 0.0.3
Name: Dataset (CLEAN), dtype: object
This will return a list of metadata fields for the dataset. The metadata
is displayed as a *pandas Series object*, which is useful for interactive
exploration of the dataset as a whole. In a programmatic context, you can
access the metadata fields as attributes of the proxy object:
.. code-block:: pycon
nyse_stock_dataset.title
Out[7]: 'NYSE Stock Dataset'
You can also update the metadata fields of the dataset:
.. code-block:: pycon
nyse_stock_dataset.title = 'NYSE Stock Dataset 2025'
nyse_stock_dataset.title
Out[9]: 'NYSE Stock Dataset 2025'
You can examine resources that this dataset may contain:
.. code-block:: pycon
nyse_stock_dataset.resources
Out[10]: Resource[UUID('f3579502-2113-4821-9bf4-9d540c129b31'), UUID('e5bf830e-3b21-4fae-9991-d90486b5d06e')]
The value returned is **proxy list**, that is, a list-like object that
can be used to access proxy objects. For example, to get the **proxy object**
for the first resource in the list, you can do:
.. code-block:: pycon
nyse_stock_dataset.resources[0]
Out[11]:
We can examine this proxy object like we did with the dataset proxy object:
.. code-block:: pycon
nyse_stock_dataset.resources[0].sl
Out[12]:
_extras {'datastore_active': False, 'relation': 'owned'}
cache_last_updated None
cache_url None
created 2025-02-12 09:17:10.411495
dataset nyse_stock_dataset
description
format CSV
hash
id f3579502-2113-4821-9bf4-9d540c129b31
last_modified None
metadata_modified 2025-02-12 09:17:40.649473
mimetype text/csv
mimetype_inner None
name AAPL 1y Stock History
position 0
resource_type None
size None
state active
url s3://klms-bucket/raw-data/stocks/aapl_intraday...
url_type None
This is quite a lot of information, but a resource is mainly about files in
the KLMS. This particular resource is a CSV file that contains the 1-year
historical stock data for the company AAPL. The `url` field contains the
location of the file in the KLMS.
Working with data
-----------------
The last line of the previous example shows the location of the file in the KLMS.
Also, we note that the format attribute indicates that the file is in CSV format.
We can download this file and load it into a **pandas DataFrame** as follows:
.. code-block:: pycon
In [1]: df = r.read_dataframe()
In [2]: df.head()
Out[7]:
Date Ticker Open ... Volume Dividends Stock Splits
0 2024-02-12 00:00:00-05:00 AAPL 187.534482 ... 41781900 0.0 0.0
1 2024-02-13 00:00:00-05:00 AAPL 184.896960 ... 56529500 0.0 0.0
2 2024-02-14 00:00:00-05:00 AAPL 184.449076 ... 54630500 0.0 0.0
3 2024-02-15 00:00:00-05:00 AAPL 182.687400 ... 65434500 0.0 0.0
4 2024-02-16 00:00:00-05:00 AAPL 182.557985 ... 49701400 0.0 0.0
[5 rows x 9 columns]
Let us compute some simple statistics on this trivial dataset and then publsh the result.
First, let us compude a dataset with some averages:
.. code-block:: pycon
In [1]: outdf = df.groupby(pd.to_datetime(df.Date,utc=True).dt.year).Open.agg(['mean','min','max'])
Out[1]: outdf
mean min max
Date
2024 208.790457 164.572913 257.906429
2025 233.446786 219.548596 248.656607
We can now publish this limited dataframe under the same dataset.
.. code-block:: pycon
In [2]: newrsrc = nyse_stock_dataset.add_dataframe(outdf, "s3://klms-bucket/stock_averages.parquet")
In [3]: newrsrc.sxl
Out[3]:
cache_last_updated None
cache_url None
columns ['mean', 'min', 'max']
created 2025-03-17 23:34:09.448511
dataset nyse_stock_dataset
datastore_active False
description {"mean":{"count":2.0,"mean":221.1186218583,"st...
format parquet
hash
id 40182a61-4e49-415f-980c-946fa7831380
last_modified None
metadata_modified 2025-03-17 23:34:09.417234
mimetype application/parquet
mimetype_inner None
name stock_averages
position 3
relation owned
resource_type None
rows 2
size 56
state active
url s3://klms-bucket/stock_averages.parquet
url_type None
Name: Resource (CLEAN), dtype: object
The simple dataframe was added as a new resource under the same dataset. The resource is a parquet file
stored in the KLMS. The `add_dataframe` method is a convenience method that allows you to add a pandas
DataFrame to the KLMS as a new resource. The method takes the DataFrame, and an S3 URL where the file
will be stored in the KLMS.
Let us clean up our new resource and dataset:
.. code-block:: pycon
In [4]: newrsrc.delete()
In [5]: client.s3fs().rm("s3://klms-bucket/stock_averages.parquet")
There are many more capabilities and features to explore in the STELAR client.
You can refer to the rest of the documentation for more information on how to use the client,
both interactively and programmatically.