Datasets#

GE Muse XML Reader#

class fusionlab.datasets.GEMuseXMLReader(path)[source]#

ECG Classification Dataset#

class fusionlab.datasets.ECGClassificationDataset(annotation_file, data_dir, transform=None, channels=None, class_names=None)[source]#

Cinc2017 Dataset#

ECG CSV Classification Dataset#

class fusionlab.datasets.ECGCSVClassificationDataset(data_root, label_filename='REFERENCE-v3.csv', channels=['lead'], class_names=['N', 'O', 'A', '~'])[source]#
__init__(data_root, label_filename='REFERENCE-v3.csv', channels=['lead'], class_names=['N', 'O', 'A', '~'])[source]#
Parameters:
  • data_root (str) – root directory of the dataset

  • label_filename (str) – filename of the label file

  • channels (list) – list of target lead names

  • class_names (list) – list of class names for mapping class name to class id

Validate CinC 2017 Dataset#

fusionlab.datasets.validate_data(csv_dir, label_path)[source]#

check if the number of csv files and label files are matched

.mat to .csv#

fusionlab.datasets.convert_mat_to_csv(root, target_dir='csv')[source]#

LUDB Dataset#

LUDB Dataset#

class fusionlab.datasets.LUDBDataset(data_dir, annotation_path, transform=None, start_idx=641, end_idx=3996, lead_name='i')[source]#
Parameters:
  • data_dir (str) – path to the dataset folder

  • annotation_path (str) – path to the annotation json file

  • transform (callable, optional) – Optional transform to be applied on a sample.

  • start_idx (int) – start index of the signal

  • end_idx (int) – end index of the signal

  • lead_name (str) – lead name to extract annotation, default: ‘i’

Returns:

(channels, sequence lenth) label_seq: (sequence lenth,)

Return type:

signal

extract_signal_label(signal, label)[source]#

extract signal and label with respect to start and end index

Parameters:
  • signal (np.array) – (signal length, 12)

  • label (np.array) – (signal length,)

get_signal(DATA_FOLDER, index)[source]#
Parameters:
  • DATA_FOLDER (str) – path to the data folder

  • index (int) – patient id

Returns:

(signal length, 12)

Return type:

signal (np.array)

map_annotaion_to_label_seq(annotation, sig_len)[source]#
Parameters:
  • annotation (dict) – annotation dict

  • sig_len (int) – signal length

Returns:

label sequence with integer class index

Return type:

label_seq (np.array)

process_annotation(export_path, lead_name='i')[source]#

process annotation file and save to json

Parameters:
  • export_path (str) – path to save the annotation json file

  • lead_name (str) – lead name to extract annotation

validate_files()[source]#

validate number of files and file types 1. check if files exist 2. check if file types are valid 3. check if number of files are valid

Parameters:

data_dir (str) – path to the dataset folder

fusionlab.datasets.plot(signal, label_seq, sr=500, channel='v1')[source]#

plot signal with annotation

Utils#

Download file#

fusionlab.datasets.download_file(url, download_root, extract_root=None, filename=None, extract=False)[source]#

Download a file from a url and optionally extract it to a target directory. :type url: str :param url: URL to download file from :type url: str :type download_root: str :param download_root: Directory to place downloaded file in :type download_root: str :type extract_root: Optional[str] :param extract_root: Directory to extract downloaded file to :type extract_root: str, optional :type filename: Optional[str] :param filename: Name to save the file under. If None, use the basename of the URL :type filename: str, optional :param extract: If True, extract the downloaded file. Otherwise, do not extract. :type extract: bool, optional

Return type:

None

HuggingFace Dataset#

class fusionlab.datasets.HFDataset(dataset)[source]#

Base Hugginface dataset wrapper class :param dataset: a dataset object that contains a getitem method

LabelStudio Time series Segmentation Dataset#

class fusionlab.datasets.LSTimeSegDataset(data_dir, annotation_path, class_map, column_names)[source]#

Dataset for label-studio timeseries segmentation task

__init__(data_dir, annotation_path, class_map, column_names)[source]#

Dataset for label-studio timeseries segmentation task

Parameters:
  • data_dir (str) – directory of csv files

  • annotation_path (str) – path to annotation json file

  • class_map (dict) – a dictionary mapping class names to class indices

  • column_names (List[str]) – A list of column names for the signal data in the CSV files.

Examples::
>>> ds = LSTimeSegDataset(data_dir="./12",
>>>                       annotation_path="./12.json",
>>>                       class_map={"N": 1, "p": 2, "t": 3},
>>>                       column_names=['i', 'ii', 'iii', 'avr', 'avl', 'avf', 'v1', 'v2', 'v3', 'v4', 'v5', 'v6'])
>>> signals, mask = ds[0]

Read csv#

fusionlab.datasets.read_csv(fname)[source]#