Package Documentation¶
Associator Loss¶
- class neural_semigroups.AssociatorLoss(*args: Any, **kwargs: Any)¶
probabilistic associator loss
- __init__(discrete: bool = False)¶
- Parameters
discrete – when
False
, the KL divergence is is used for measuring associativity in a continuous way. whenTrue
, returns1
for associative and0
for non associative samples.
- forward(cayley_cubes: torch.Tensor) torch.Tensor ¶
finds a probabilistic associator of a given probabilistic Cayley cube
First, we build two 4-index tensors representing probability distributions of products \(a\left(bc\right)\) and \(\left(ab\right)c\), respectively:
\(T_{ijkl}=P\left\{e_i\left(e_je_k\right)=e_l\right\}= \sum\limits_{m=1}^nP\left\{e_ie_m=e_l\vert e_je_k=e_m\right\} P\left\{e_je_k=e_m\right\}=\sum\limits_{m=1}^na_{iml}a_{jkm}\)
and
\(T\prime_{ijkl}=P\left\{\left(e_ie_j\right)e_k=e_l\right\}= \sum\limits_{m=1}^nP\left\{e_me_k=e_l\vert e_ie_j=e_m\right\} P\left\{e_ie_j=e_m\right\}=\sum\limits_{m=1}^na_{mkl}a_{ijm}\)
Then we calculate Kullback-Leibler divergence between \(T_{ijkl}\) and \(T\prime_{ijkl}\) to find a continuous measure of associativity of the input table.
- Parameters
cayley_cubes – a batch of probabilistic Cayley cubes
- Returns
the probabilistic associator
Constant Baseline¶
- class neural_semigroups.ConstantBaseline(*args: Any, **kwargs: Any)¶
A model that always fills in the same number
- __init__(cardinality: int, fill_in_with: int = 0)¶
initializes a constant model
>>> ConstantBaseline(2, 3) Traceback (most recent call last): ... ValueError: `fill_in_with` should be non-negative and less than `cardinality`, got 3 >= 2 >>> ConstantBaseline(2, 1).constant_distribution.cpu() tensor([0., 1.])
- Parameters
cardinality – the number of elements in a magma
fill_in_with – an item which will be suggested as a correct answer
- forward(cayley_cube: torch.Tensor) torch.Tensor ¶
forward pass inhereted from Module
>>> ConstantBaseline(2, 1)(torch.tensor([ ... [[[0., 1.], [0.5, 0.5]], [[1., 0.], [0., 1.]]], ... [[[0., 1.], [1.0, 0.0]], [[0.5, 0.5], [0., 1.]]] ... ]).to(CURRENT_DEVICE)).cpu() tensor([[[[0., 1.], [0., 1.]], [[1., 0.], [0., 1.]]], [[[0., 1.], [1., 0.]], [[0., 1.], [0., 1.]]]])
- Parameters
cayley_cube – probabilistic representation of a magma
- Returns
a batch of constant values (set in the constructor)
Cyclic Group¶
Denoising Autoencoder for Magmas¶
- class neural_semigroups.MagmaDAE(*args: Any, **kwargs: Any)¶
Denoising Autoencoder for probability Cayley cubes of magmas
- __init__(cardinality: int, hidden_dims: List[int], do_reparametrization: bool = False)¶
- Parameters
cardinality – the number of elements in a magma
hidden_dims – a list of sizes of hidden layers of the encoder and the decoder
do_reparametrization – if
True
, adds a reparametrization trick
- decode(encoded_input: torch.Tensor) torch.Tensor ¶
represent an embedding vector as something with size aligned with the input
- Parameters
encoded_input – an embedding vector
- Returns
a vector of values from
0
to1
(kind of probabilities)
- encode(input_with_noise: torch.Tensor) torch.Tensor ¶
represent input cube as an embedding vector
- Parameters
input_with_noise – a tensor with two indices
- Returns
some tensor with two indices and non-negative values
- forward(cayley_cubes: torch.Tensor) torch.Tensor ¶
forward pass inherited from Module
- Parameters
cayley_cubes – a batch of probabilistic representations of magmas
- Returns
auto-encoded probabilistic representations of magmas
- reparametrize(mu_and_log_sigma: torch.Tensor) torch.Tensor ¶
do a reparametrization trick
- Parameters
mu_and_sigma – vector of expectation and standard deviation
- Returns
sample from a distribution
Magma¶
- class neural_semigroups.Magma(cayley_table: Optional[torch.Tensor] = None, cardinality: Optional[int] = None)¶
an implementation of a magma (a set with a binary operation)
- __init__(cayley_table: Optional[torch.Tensor] = None, cardinality: Optional[int] = None)¶
constucts a new magma
>>> seed = torch.manual_seed(11) >>> Magma(cardinality=2) tensor([[1, 1], [0, 1]]) >>> Magma(torch.tensor([[0, 1], [1, 0]])) tensor([[0, 1], [1, 0]]) >>> Magma() Traceback (most recent call last): ... ValueError: at least one argument must be given >>> Magma(torch. tensor([[0]]), cardinality=2) Traceback (most recent call last): ... ValueError: cayley_table must be a `torch.Tensor` of shape (cardinality, cardinality)
- Parameters
cayley_table – a Cayley table for a magma. If not provided, a random table is generated.
cardinality – a number of elements in a magma to generate a random one
- property cardinality: int¶
number of elements in a magma
- property has_inverses: bool¶
check whether there are solutions of equations \(ax=b\) and \(xa=b\)
- property identity: int¶
find an identity element in a Cayley table
- Returns
the index of the identity element or -1 if there is no identity
- property is_associative: bool¶
check associativity of a Cayley table
- Returns
whether the input table is associative or not
- property is_commutative: bool¶
check commutativity of a Cayley table
- Returns
whether the input table is commutative or not
- property next_magma: Magma¶
goes to the next magma Cayley table in their lexicographical order
>>> Magma(torch.tensor([[0, 1], [1, 0]])).next_magma tensor([[0, 1], [1, 1]]) >>> Magma(torch.tensor([[0, 1], [1, 1]])).next_magma tensor([[1, 0], [0, 0]]) >>> Magma(torch.tensor([[1, 1], [1, 1]])).next_magma Traceback (most recent call last): ... ValueError: there is no next magma!
- Returns
another magma
- property probabilistic_cube: torch.Tensor¶
a 3d array \(a\) where \(a_{ijk}=P\left\{e_ie_j=e_k\right\}\)
- Returns
a probabilistic cube representation of a Cayley table
- random_isomorphism() torch.Tensor ¶
get some Cayley table isomorphic to
self.cayley_table
form example>>> Magma(torch.tensor([[0, 0], [0, 0]])).random_isomorphism() tensor([[1, 1], [1, 1]])
Precise Guess Loss¶
Datasets¶
Random Dataset¶
- class neural_semigroups.datasets.RandomDataset(*args: Any, **kwargs: Any)¶
an iterable dataset having fixed length and returning random tensors of pre-defined shape
>>> data = RandomDataset(2, ([5, 2], [1])) >>> print([item.shape for item in data[1]]) [torch.Size([5, 2]), torch.Size([1])] >>> for row in data: ... print([item.shape for item in row]) ... break [torch.Size([5, 2]), torch.Size([1])] >>> data = RandomDataset(3, [4, 4]) >>> print(data[1].shape) torch.Size([4, 4]) >>> for row in data: ... print(row.shape) ... break torch.Size([4, 4]) >>> print(len(data)) 3
- __init__(data_size: int, data_dim: Union[torch.Size, Tuple[torch.Size, ...]])¶
Semigroups Dataset¶
- class neural_semigroups.datasets.SemigroupsDataset(*args: Any, **kwargs: Any)¶
an extension of
torch.util.data.TensorDataset
similar to a private classtorchvision.datasets.vision.VisionDataset
- __init__(root: str, cardinality: int, transform: Optional[Callable] = None)¶
- Parameters
root – root directory of dataset
cardinality – a semigroup cardinality to use.
transform – a function/transform that takes in a Cayley table and returns a transformed version.
Smallsemi Dataset¶
- class neural_semigroups.datasets.Smallsemi(*args: Any, **kwargs: Any)¶
a
torch.util.data.Dataset
wrapper for the data from https://www.gap-system.org/Packages/smallsemi.html>>> import shutil >>> from neural_semigroups.constants import TEST_TEMP_DATA >>> shutil.rmtree(TEST_TEMP_DATA, ignore_errors=True) >>> os.mkdir(TEST_TEMP_DATA) >>> smallsemi = Smallsemi(root=TEST_TEMP_DATA, cardinality=2) Traceback (most recent call last): ... ValueError: test_temp_data must have exactly one version of smallsemi >>> smallsemi = Smallsemi( ... root=TEST_TEMP_DATA, ... cardinality=2, ... download=True, ... transform=lambda x: x ... ) >>> smallsemi[0][0] tensor([[0, 0], [0, 0]])
- __init__(root: str, cardinality: int, download: bool = False, transform: Optional[Callable] = None)¶
- Parameters
root – root directory of dataset where
smallsemi-*/data/data2to7/
exist.cardinality – a semigroup cardinality to use. Corresponds to
data{cardinality}.gl.gz
.download – if true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform – a function/transform that takes in a Cayley table and returns a transformed version.
- download() None ¶
downloads, unzips and moves
smallsemi
data
- load_data_and_labels_from_smallsemi() None ¶
loads data from
smallsemi
package
Mace4 Semigroups Dataset¶
- class neural_semigroups.datasets.Mace4Semigroups(*args: Any, **kwargs: Any)¶
a
torch.util.data.Dataset
wrapper for the data ofmace4
output stored in asqlite
database>>> import shutil >>> from neural_semigroups.constants import TEST_TEMP_DATA >>> import os >>> from neural_semigroups.generate_data_with_mace4 import ( ... generate_data_with_mace4) >>> shutil.rmtree(TEST_TEMP_DATA, ignore_errors=True) >>> os.mkdir(TEST_TEMP_DATA) >>> database = os.path.join(TEST_TEMP_DATA,"test.db") >>> torch.manual_seed(42) <torch... >>> generate_data_with_mace4([ ... "--max_dim", "2", ... "--min_dim", "2", ... "--number_of_tasks", "1", ... "--database_name", database]) >>> mace4_semigroups = Mace4Semigroups( ... root=database, ... cardinality=2, ... transform=lambda x: x ... ) >>> mace4_semigroups[0][0] tensor([[0, 0], [0, 0]]) >>> mace4_semigroups.get_table_from_output("not a mace4 output file") Traceback (most recent call last): ... ValueError: wrong mace4 output file format!
- __init__(cardinality: int, root: str, transform: Optional[Callable] = None)¶
- Parameters
root – a full path to an
sqlite
database file which has a tablemace_output
with a string columnoutput
cardinality – the cardinality of semigroups
transform – a function/transform that takes a Cayley table and returns a transformed version.
- get_additional_info(cursor: Cursor) int ¶
gets some info from an SQLite database with
mace4
outputs- Parameters
cursor – an SQLite database cursor
- Returns
a total number of rows in a table, a magma dimension
- get_table_from_output(output: str) torch.Tensor ¶
gets a Cayley table of a magma from the output of
mace4
- Parameters
output – output of
mace4
- Returns
a Cayley table
- load_data_from_mace_output() None ¶
loads data generated by
mace4
from ansqlite
database
utils¶
A collection of different functions used by other modules.
- neural_semigroups.utils.random_semigroup(dim: int, maximal_tries: int) Tuple[bool, torch.Tensor] ¶
randomly search for a semigroup Cayley table. Not recommended to use with dim > 4
- Parameters
dim – number of elements in a semigroup
maximal_tries – how many times to try at most
- Returns
a pair (whether the Cayley table is associative, a Cayley table of a magma)
- neural_semigroups.utils.get_magma_by_index(cardinality: int, index: int) Magma ¶
find a magma from a lexicographical order by its index
- Parameters
cardinality – the number of elements in a magma
index – an index of magma in a lexicographical order
- Returns
a magma with a given index
- neural_semigroups.utils.import_smallsemi_format(lines: List[str]) torch.Tensor ¶
imports lines in a format used by
smallsemi
GAP package. Format description:filename is of a form
data[n].gl
, \(1<=n<=7\)lines are separated by a pair of symbols
\r\n
there are exactly \(n^2\) lines in a file
the first line is a header starting with ‘#’ symbol
each line is a string of \(N\) digits from \(0\) to \(n-1\)
\(N\) is the number of semigroups in the database
each column represents a serialised Cayley table
the database contains only cells starting from the second
the first cell of each Cayley table is assumed to be filled with
0
- Parameters
lines – lines read from a file of smallsemi format
- Returns
a list of Cayley tables
- neural_semigroups.utils.download_file_from_url(url: str, filename: str, buffer_size: int = 1024) None ¶
downloads some file from the Web to a specified destination
>>> import os >>> TEMP_FILE = "test.html" >>> if os.path.exists(TEMP_FILE): ... os.remove(TEMP_FILE) >>> download_file_from_url("https://python.org/", TEMP_FILE) >>> os.path.exists(TEMP_FILE) True
- Parameters
url – a valid HTTP URL
filename – a valid filename
buffer_size – a number of bytes to read from URL at once
- neural_semigroups.utils.find_substring_by_pattern(strings: List[str], starts_with: str, ends_before: str) str ¶
search for a first occurrence of a given pattern in a string list
>>> some_strings = ["one", "two", "three"] >>> find_substring_by_pattern(some_strings, "t", "o") 'tw' >>> find_substring_by_pattern(some_strings, "four", "five") Traceback (most recent call last): ... ValueError: pattern four.*five not found
- Parameters
strings – a list of strings where the pattern is searched for
starts_with – the first letters of a pattern
ends_before – a substring which marks the beginning of something different
- Returns
a pattern which starts with
starts_with
and ends beforeends_before
- neural_semigroups.utils.get_newest_file(dir_path: str) str ¶
get the last modified file from a diretory
>>> from pathlib import Path >>> from os import makedirs, path >>> from neural_semigroups.constants import TEST_TEMP_DATA >>> shutil.rmtree(path.join(TEST_TEMP_DATA, "tmp"), ignore_errors=True) >>> makedirs(path.join(TEST_TEMP_DATA, "tmp")) >>> Path(path.join(TEST_TEMP_DATA, "tmp", "one")).touch() >>> from time import sleep >>> sleep(0.01) >>> Path(path.join(TEST_TEMP_DATA, "tmp", "two")).touch() >>> get_newest_file(path.join(TEST_TEMP_DATA, "tmp")) 'test_temp_data/tmp/two'
- Parameters
dir_path – a directory path
- Returns
the last modified file’s name
- neural_semigroups.utils.make_discrete(cayley_cubes: torch.Tensor) torch.Tensor ¶
transforms a batch of probabilistic Cayley cubes and in the following way:
maximal probabilities in the last dimension become ones
all other probabilies become zeros
>>> make_discrete(torch.tensor([ ... [[[0.9, 0.1], [0.1, 0.9]], [[0.8, 0.2], [0.2, 0.8]]], ... [[[0.7, 0.3], [0.3, 0.7]], [[0.7, 0.3], [0.3, 0.7]]], ... ])) tensor([[[[1., 0.], [0., 1.]], [[1., 0.], [0., 1.]]], [[[1., 0.], [0., 1.]], [[1., 0.], [0., 1.]]]])
- Parameters
cayley_cubes – a batch of probabilistic cubes representing Cayley tables
- Returns
a batch of probabilistic cubes filled in with
0
or1
- neural_semigroups.utils.count_different(one: torch.Tensor, two: torch.Tensor) torch.Tensor ¶
given two batches of the same size, counts number of positions in these batches, on which the tensor from the first batch differs from the second
- Parameters
one – one batch of tensors
two – another batch of tensors
- Returns
the number of different tensors
- neural_semigroups.utils.hide_cells(cayley_table: torch.Tensor, number_of_cells: int) torch.Tensor ¶
set several cells in a Cayley table to math:-1
>>> torch.manual_seed(42) <torch... >>> hide_cells(torch.tensor([[0, 1], [2, 3]]), 2).cpu() tensor([[ 0, 1], [-1, -1]])
- Parameters
cayley_table – a Cayley table
number_of_cells – a number of cells to hide
- Returns
a Cayley table with hidden cells
- neural_semigroups.utils.read_whole_file(filename: str) str ¶
reads the whole file into a string, for example
>>> read_whole_file("README.rst").split("\n")[3] 'Neural Semigroups'
- Parameters
filename – a name of the file to read
- Returns
whole contents of the file
- neural_semigroups.utils.partial_table_to_cube(table: torch.Tensor) torch.Tensor ¶
create a probabilistic cube from a partially filled Cayley table
-1
is translated to \(\frac1n\) where \(n\) is table’s cardinality, for example>>> partial_table_to_cube(torch.tensor([[0, -1], [0, 0]])).cpu() tensor([[[1.0000, 0.0000], [0.5000, 0.5000]], [[1.0000, 0.0000], [1.0000, 0.0000]]])
- Parameters
table – a Cayley table, partially filled by
-1
’s- Returns
a probabilistic cube
- neural_semigroups.utils.connect_to_db(database_name: str) Cursor ¶
open a connection to an SQLite database
- Parameters
database_name – filename of a database
- Returns
a cursor to the database
- neural_semigroups.utils.create_table_if_not_exists(cursor: Cursor, table_name: str, columns: List[str]) None ¶
create a table if it does not exist
- Parameters
cursor – a cursor to the database where to create a table
table_name – what table to create
columns – a list of strings of format “COLUMN_NAME COLUMN_TYPE”
- Returns
- neural_semigroups.utils.insert_values_into_table(cursor: Cursor, table_name: str, values: Tuple[str, ...]) None ¶
inserts values into a table
- Parameters
cursor – a cursor to database where the target table is located
table_name – the target table
values – values to insert into the target table
- Returns
- neural_semigroups.utils.gunzip(archive_path: str) None ¶
extracts a GZIP file in the same folder
- Parameters
archive_path – a path ending with
.gz
- Returns
generate_data_with_mace4¶
A script which generates semigroups with mace4
and saves them in an sqlite
database.
- neural_semigroups.generate_data_with_mace4.generate_data_with_mace4(input_args: Optional[List[str]] = None) None ¶
- Parameters
input_args – a list of arguments (if
None
then ones from the command line are used)- Returns