Datasets#

GE Muse XML Reader#

class fusionlab.datasets.GEMuseXMLReader(path)[source]#

ECG Classification Dataset#

class fusionlab.datasets.ECGClassificationDataset(annotation_file, data_dir, transform=None, channels=None, class_names=None)[source]#

Cinc2017 Dataset#

ECG CSV Classification Dataset#

class fusionlab.datasets.ECGCSVClassificationDataset(data_root, label_filename='REFERENCE-v3.csv', channels=['lead'], class_names=['N', 'O', 'A', '~'])[source]#

__init__(data_root, label_filename='REFERENCE-v3.csv', channels=['lead'], class_names=['N', 'O', 'A', '~'])[source]#

Parameters:

data_root (str) – root directory of the dataset
label_filename (str) – filename of the label file
channels (list) – list of target lead names
class_names (list) – list of class names for mapping class name to class id

Validate CinC 2017 Dataset#

fusionlab.datasets.validate_data(csv_dir, label_path)[source]#: check if the number of csv files and label files are matched

.mat to .csv#

fusionlab.datasets.convert_mat_to_csv(root, target_dir='csv')[source]#

LUDB Dataset#

class fusionlab.datasets.LUDBDataset(data_dir, annotation_path, transform=None, start_idx=641, end_idx=3996, lead_name='i')[source]#

Parameters:

data_dir (str) – path to the dataset folder
annotation_path (str) – path to the annotation json file
transform (callable, optional) – Optional transform to be applied on a sample.
start_idx (int) – start index of the signal
end_idx (int) – end index of the signal
lead_name (str) – lead name to extract annotation, default: ‘i’

Returns:

(channels, sequence lenth) label_seq: (sequence lenth,)

Return type:

signal

extract_signal_label(signal, label)[source]#

extract signal and label with respect to start and end index

Parameters:

signal (np.array) – (signal length, 12)
label (np.array) – (signal length,)

get_signal(DATA_FOLDER, index)[source]#

Parameters:

DATA_FOLDER (str) – path to the data folder
index (int) – patient id

Returns:

(signal length, 12)

Return type:

signal (np.array)

map_annotaion_to_label_seq(annotation, sig_len)[source]#

Parameters:

annotation (dict) – annotation dict
sig_len (int) – signal length

Returns:

label sequence with integer class index

Return type:

label_seq (np.array)

process_annotation(export_path, lead_name='i')[source]#

process annotation file and save to json

Parameters:

export_path (str) – path to save the annotation json file
lead_name (str) – lead name to extract annotation

validate_files()[source]#

validate number of files and file types 1. check if files exist 2. check if file types are valid 3. check if number of files are valid

Parameters:: data_dir (str) – path to the dataset folder

fusionlab.datasets.plot(signal, label_seq, sr=500, channel='v1')[source]#: plot signal with annotation

Utils#

Download file#

fusionlab.datasets.download_file(url, download_root, extract_root=None, filename=None, extract=False)[source]#

Download a file from a url and optionally extract it to a target directory. :type url: str :param url: URL to download file from :type url: str :type download_root: str :param download_root: Directory to place downloaded file in :type download_root: str :type extract_root: Optional[str] :param extract_root: Directory to extract downloaded file to :type extract_root: str, optional :type filename: Optional[str] :param filename: Name to save the file under. If None, use the basename of the URL :type filename: str, optional :param extract: If True, extract the downloaded file. Otherwise, do not extract. :type extract: bool, optional

Return type:: None

HuggingFace Dataset#

class fusionlab.datasets.HFDataset(dataset)[source]#: Base Hugginface dataset wrapper class :param dataset: a dataset object that contains a getitem method

LabelStudio Time series Segmentation Dataset#

class fusionlab.datasets.LSTimeSegDataset(data_dir, annotation_path, class_map, column_names)[source]#

Dataset for label-studio timeseries segmentation task

__init__(data_dir, annotation_path, class_map, column_names)[source]#

Dataset for label-studio timeseries segmentation task

Parameters:

data_dir (str) – directory of csv files
annotation_path (str) – path to annotation json file
class_map (dict) – a dictionary mapping class names to class indices
column_names (List[str]) – A list of column names for the signal data in the CSV files.

Examples::

>>> ds = LSTimeSegDataset(data_dir="./12",
>>>                       annotation_path="./12.json",
>>>                       class_map={"N": 1, "p": 2, "t": 3},
>>>                       column_names=['i', 'ii', 'iii', 'avr', 'avl', 'avf', 'v1', 'v2', 'v3', 'v4', 'v5', 'v6'])
>>> signals, mask = ds[0]

Read csv#

fusionlab.datasets.read_csv(fname)[source]#